Gene Pre-filtering: My Two Shekels

0

Entering edit mode

Assaf Oron ▴ 40

@assaf-oron-2764

Last seen 9.6 years ago

Hi all, Allow me to add my perspective as a relative newcomer into this field. At first I too was alarmed by the apparent violation of statistical orthodoxy involved in pre-filtering. But after witnessing how well this works on real data, my opinion has changed. I feel that either the statistician's perspective of p-values and inference or the data-miner's perspective of signal vs. noise and informative probes, may be misleading if taken in isolation. What has helped me is thinking of the original scientific problem. We have a large number of genes, belonging (roughly speaking) to three groups: differentially expressed, non-differentially expressed, and not expressed at all. Typically, our task is to identify the first group. Now, neglecting to pre-filter is equivalent to conflating the second and third groups (or, equivalently, assuming that the third group does not exist). Indeed, the current prevalent differential-expression methodology ignores the existence of 3 groups. This obviously leads to errors. Prefiltering via nsFilter or otherwise (e.g., the McClintick and Edenbert 2006 article referred to by Mark) is equivalent to trying to identify and remove the third group, and then use DE methodology to separate the first two. A more sophisticated version of prefiltering has been recently suggested by Calza et al. 2007: S. Calza, W. Raffelsberger, A. Ploner et al. Filtering genes to improve sensitivity in oligonucleodtide microarray data analysis. Nucleic Acids Research 35, #16, e102. I haven't tried this on any data yet, but they do have a home-grown R package available. My own gut feel is that much can be gained by looking at all 3 groups together and trying to distinguish between them in "one fell swoop". Once the problem is seen this way, we have all the pattern-recognition arsenal of machine learning at our disposal. Cheers, Assaf

Microarray Microarray • 732 views

ADD COMMENT • link updated 15.8 years ago by Talloen, Willem [PRDBE] ▴ 40 • written 15.8 years ago by Assaf Oron ▴ 40

0

Entering edit mode

Talloen, Willem [PRDBE] ▴ 40

@talloen-willem-prdbe-1616

Last seen 9.6 years ago

I believe gene filtering is advisable as long as you do NOT USE THE LABELS of the arrays. You should however always remain cautious not being too stringent; a low FDR is nice but useless if you excluded some of the interesting genes. Another powerfull gene filtering method using probe level info for Affy chips is I/NI calls http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/21/28 97 Willem > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of > aoron at fhcrc.org > Sent: Friday, 27 June 2008 02:23 > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Gene Pre-filtering: My Two Shekels > > > Hi all, > > Allow me to add my perspective as a relative newcomer into this field. > > At first I too was alarmed by the apparent violation of statistical > orthodoxy involved in pre-filtering. But after witnessing how well > this works on real data, my opinion has changed. > > I feel that either the statistician's perspective of p-values and > inference or the data-miner's perspective of signal vs. noise and > informative probes, may be misleading if taken in isolation. > > What has helped me is thinking of the original scientific > problem. We > have a large number of genes, belonging (roughly speaking) to three > groups: differentially expressed, non-differentially expressed, and > not expressed at all. Typically, our task is to identify the first > group. > > Now, neglecting to pre-filter is equivalent to conflating the second > and third groups (or, equivalently, assuming that the third > group does > not exist). Indeed, the current prevalent differential-expression > methodology ignores the existence of 3 groups. This obviously > leads to > errors. > > Prefiltering via nsFilter or otherwise (e.g., the McClintick and > Edenbert 2006 article referred to by Mark) is equivalent to > trying to > identify and remove the third group, and then use DE methodology to > separate the first two. A more sophisticated version of prefiltering > has been recently suggested by Calza et al. 2007: > > S. Calza, W. Raffelsberger, A. Ploner et al. Filtering genes to > improve sensitivity in oligonucleodtide microarray data analysis. > Nucleic Acids Research 35, #16, e102. > > I haven't tried this on any data yet, but they do have a > home-grown R > package available. > > My own gut feel is that much can be gained by looking at all > 3 groups > together and trying to distinguish between them in "one fell swoop". > Once the problem is seen this way, we have all the > pattern-recognition > arsenal of machine learning at our disposal. > > Cheers, Assaf > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 15.8 years ago Talloen, Willem [PRDBE] ▴ 40

Login before adding your answer.