classification method applied to microarrays (CMA package)
1
0
Entering edit mode
@juan-c-oliveros-collazos-2665
Last seen 9.6 years ago
Dear all, I am starting using the CMA package for classification of microarray samples. In particular, I want to know which genes are the main responsible for separating about 60 lists of expression values into 2 groups that are already known. I understand that SVM is a good method to find the hyperplane that best separate the two groups but what I need are the genes, not the hyperplane parameters. My questions are: To get a list of genes, should I use in some manner SVMs (or another classification method) or what I need is simply to identify the "informative" genes by using GeneSelection function of CMA package? If so, the learning sets are needed? why? Any recomendation for choosing a gene selection method? Thanks in advance. best, Juan Carlos Oliveros CNB-CSIC, Madrid, Spain
Classification CMA Classification CMA • 1.1k views
ADD COMMENT
0
Entering edit mode
@stephen-henderson-3758
Last seen 9.6 years ago
The svm is a reasonable classifier that performs OK on microarray data and usually requires no tuning of parameters (usually)-- although many others do too. In order to understand the GeneSelection method you need to understand cross validation (this occurs within the classification function). The cross validation is estimating the classification error by splitting the data into many training and test set combinations. The model -- your svm is built on the training set-- and then tested against the test set to see how many errors of classification are made. If you choose GeneSelection (which you probably should) then the data is reduced to a subset of features/genes based on a simple stat. However not only one set of genes will be selected-- but genes for every training set in the cross validation. Otherwise the likely svm misclassification error would be an overestimate. So when you use the toplist function on your GeneSelection object you will find that there are a number of feature lists none exactly the same. The 'informative' genes are those that occur most frequently in the toplists. You can examine the GeneSelection toplist before you run the classification function-- but obviously you will want to run the classification function to check that the features are indeed 'informative'. You can use the GeneSelection method that gives the least cross- validation error. I'd start with limma but if there is a reasonable separation of classes then they should work similarly. jeez I hope that is clear.... Stephen Henderson UCL On 27 Oct 2009, at 11:21, Juan Carlos Oliveros Collazos wrote: > Dear all, > > I am starting using the CMA package for classification of microarray > samples. > > In particular, I want to know which genes are the main responsible > for separating about 60 lists of expression values into 2 groups > that are already known. I understand that SVM is a good method to > find the hyperplane that best separate the two groups but what I > need are the genes, not the hyperplane parameters. > > My questions are: > > To get a list of genes, should I use in some manner SVMs (or another > classification method) or what I need is simply to identify the > "informative" genes by using GeneSelection function of CMA package? > > If so, the learning sets are needed? why? > > Any recomendation for choosing a gene selection method? > > Thanks in advance. > > best, > > Juan Carlos Oliveros > CNB-CSIC, Madrid, Spain > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6