Selecting genes for machine learning
2
0
Entering edit mode
@january-weiner-3999
Last seen 10.3 years ago
Dear all, what is currently regarded as the optimal strategy to select genes for machine learning analysis? Taking all of the 40k or so genes is not doable (at least with randomForest, which I use). "Bioconductor case studies" suggests using nsFilter with argument var.cutoff=0.75, however I am not sure how that is calculated. Are the genes sorted according to absolute variance? If yes, is that method really suitable for filtering "uninteresting" genes? Kind regards, January -- -------- Dr. January Weiner 3 -------------------------------------- Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin, Germany Web?? : www.mpiib-berlin.mpg.de Tel? ?? : +49-30-28460514
• 1.0k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
Filtering by variance is certainly an acceptable way to go. Sean On Fri, Jun 24, 2011 at 10:27 AM, January Weiner <january.weiner at="" mpiib-berlin.mpg.de=""> wrote: > Dear all, > > what is currently regarded as the optimal strategy to select genes for > machine learning analysis? Taking all of the 40k or so genes is not > doable (at least with randomForest, which I use). "Bioconductor case > studies" suggests using nsFilter with argument var.cutoff=0.75, > however I am not sure how that is calculated. Are the genes sorted > according to absolute variance? If yes, is that method really suitable > for filtering "uninteresting" genes? > > Kind regards, > > January > > -- > -------- Dr. January Weiner 3 -------------------------------------- > Max Planck Institute for Infection Biology > Charit?platz 1 > D-10117 Berlin, Germany > Web?? : www.mpiib-berlin.mpg.de > Tel? ?? : +49-30-28460514 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Djork Clevert ▴ 210
@djork-clevert-422
Last seen 10.3 years ago
Dear January, if you have Affymetrix data you could try to filter genes by their information content. You can find the Bioinformatics publication here: "I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data" at http://bioinformatics.oxfordjournals.org/content/23/21/2897.full. The I/NI filter is included in our farms package, which is according to the Affycomp benchmark the leading summarization method with respect to sensitivity and specificity. Greetings from Berlin, Okko -- dipl.-inf. djork clevert | gleimstr. 13a | d-10437 berlin e: okko at clevert.de p: +49.30.4432 4702 f: +49.30.6883 5307 Am 24.06.2011 um 16:27 schrieb January Weiner: > Dear all, > > what is currently regarded as the optimal strategy to select genes for > machine learning analysis? Taking all of the 40k or so genes is not > doable (at least with randomForest, which I use). "Bioconductor case > studies" suggests using nsFilter with argument var.cutoff=0.75, > however I am not sure how that is calculated. Are the genes sorted > according to absolute variance? If yes, is that method really suitable > for filtering "uninteresting" genes? > > Kind regards, > > January > > -- > -------- Dr. January Weiner 3 -------------------------------------- > Max Planck Institute for Infection Biology > Charit?platz 1 > D-10117 Berlin, Germany > Web : www.mpiib-berlin.mpg.de > Tel : +49-30-28460514 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6