Selecting genes for machine learning

0

Entering edit mode

January Weiner ▴ 370

@january-weiner-3999

Last seen 11.4 years ago

Dear all, what is currently regarded as the optimal strategy to select genes for machine learning analysis? Taking all of the 40k or so genes is not doable (at least with randomForest, which I use). "Bioconductor case studies" suggests using nsFilter with argument var.cutoff=0.75, however I am not sure how that is calculated. Are the genes sorted according to absolute variance? If yes, is that method really suitable for filtering "uninteresting" genes? Kind regards, January -- -------- Dr. January Weiner 3 -------------------------------------- Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin, Germany Web?? : www.mpiib-berlin.mpg.de Tel? ?? : +49-30-28460514

• 1.2k views

ADD COMMENT • link updated 14.6 years ago by Djork Clevert ▴ 210 • written 14.6 years ago by January Weiner ▴ 370

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 13 days ago

United States

Filtering by variance is certainly an acceptable way to go. Sean On Fri, Jun 24, 2011 at 10:27 AM, January Weiner <january.weiner at="" mpiib-berlin.mpg.de=""> wrote: > Dear all, > > what is currently regarded as the optimal strategy to select genes for > machine learning analysis? Taking all of the 40k or so genes is not > doable (at least with randomForest, which I use). "Bioconductor case > studies" suggests using nsFilter with argument var.cutoff=0.75, > however I am not sure how that is calculated. Are the genes sorted > according to absolute variance? If yes, is that method really suitable > for filtering "uninteresting" genes? > > Kind regards, > > January > > -- > -------- Dr. January Weiner 3 -------------------------------------- > Max Planck Institute for Infection Biology > Charit?platz 1 > D-10117 Berlin, Germany > Web?? : www.mpiib-berlin.mpg.de > Tel? ?? : +49-30-28460514 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 14.6 years ago Sean Davis 21k

0

Entering edit mode

Djork Clevert ▴ 210

@djork-clevert-422

Last seen 11.4 years ago

Dear January, if you have Affymetrix data you could try to filter genes by their information content. You can find the Bioinformatics publication here: "I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data" at http://bioinformatics.oxfordjournals.org/content/23/21/2897.full. The I/NI filter is included in our farms package, which is according to the Affycomp benchmark the leading summarization method with respect to sensitivity and specificity. Greetings from Berlin, Okko -- dipl.-inf. djork clevert | gleimstr. 13a | d-10437 berlin e: okko at clevert.de p: +49.30.4432 4702 f: +49.30.6883 5307 Am 24.06.2011 um 16:27 schrieb January Weiner: > Dear all, > > what is currently regarded as the optimal strategy to select genes for > machine learning analysis? Taking all of the 40k or so genes is not > doable (at least with randomForest, which I use). "Bioconductor case > studies" suggests using nsFilter with argument var.cutoff=0.75, > however I am not sure how that is calculated. Are the genes sorted > according to absolute variance? If yes, is that method really suitable > for filtering "uninteresting" genes? > > Kind regards, > > January > > -- > -------- Dr. January Weiner 3 -------------------------------------- > Max Planck Institute for Infection Biology > Charit?platz 1 > D-10117 Berlin, Germany > Web : www.mpiib-berlin.mpg.de > Tel : +49-30-28460514 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.6 years ago Djork Clevert ▴ 210

Login before adding your answer.