classification method applied to microarrays (CMA package)

0

Entering edit mode

Juan C Oliveros Collazos ▴ 190

@juan-c-oliveros-collazos-2665

Last seen 11.4 years ago

Dear all, I am starting using the CMA package for classification of microarray samples. In particular, I want to know which genes are the main responsible for separating about 60 lists of expression values into 2 groups that are already known. I understand that SVM is a good method to find the hyperplane that best separate the two groups but what I need are the genes, not the hyperplane parameters. My questions are: To get a list of genes, should I use in some manner SVMs (or another classification method) or what I need is simply to identify the "informative" genes by using GeneSelection function of CMA package? If so, the learning sets are needed? why? Any recomendation for choosing a gene selection method? Thanks in advance. best, Juan Carlos Oliveros CNB-CSIC, Madrid, Spain

Classification CMA Classification CMA • 1.5k views

ADD COMMENT • link updated 16.3 years ago by Stephen Henderson ▴ 10 • written 16.3 years ago by Juan C Oliveros Collazos ▴ 190

0

Entering edit mode

Stephen Henderson ▴ 10

@stephen-henderson-3758

Last seen 11.4 years ago

The svm is a reasonable classifier that performs OK on microarray data and usually requires no tuning of parameters (usually)-- although many others do too. In order to understand the GeneSelection method you need to understand cross validation (this occurs within the classification function). The cross validation is estimating the classification error by splitting the data into many training and test set combinations. The model -- your svm is built on the training set-- and then tested against the test set to see how many errors of classification are made. If you choose GeneSelection (which you probably should) then the data is reduced to a subset of features/genes based on a simple stat. However not only one set of genes will be selected-- but genes for every training set in the cross validation. Otherwise the likely svm misclassification error would be an overestimate. So when you use the toplist function on your GeneSelection object you will find that there are a number of feature lists none exactly the same. The 'informative' genes are those that occur most frequently in the toplists. You can examine the GeneSelection toplist before you run the classification function-- but obviously you will want to run the classification function to check that the features are indeed 'informative'. You can use the GeneSelection method that gives the least cross- validation error. I'd start with limma but if there is a reasonable separation of classes then they should work similarly. jeez I hope that is clear.... Stephen Henderson UCL On 27 Oct 2009, at 11:21, Juan Carlos Oliveros Collazos wrote: > Dear all, > > I am starting using the CMA package for classification of microarray > samples. > > In particular, I want to know which genes are the main responsible > for separating about 60 lists of expression values into 2 groups > that are already known. I understand that SVM is a good method to > find the hyperplane that best separate the two groups but what I > need are the genes, not the hyperplane parameters. > > My questions are: > > To get a list of genes, should I use in some manner SVMs (or another > classification method) or what I need is simply to identify the > "informative" genes by using GeneSelection function of CMA package? > > If so, the learning sets are needed? why? > > Any recomendation for choosing a gene selection method? > > Thanks in advance. > > best, > > Juan Carlos Oliveros > CNB-CSIC, Madrid, Spain > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.3 years ago Stephen Henderson ▴ 10

Login before adding your answer.