Cutoff to use for IQR filtering in genefilter
I am wondering what cutoff value I should use for IQR filtering in genefilter. I did some literature search. It varies from paper to paper. I have read two papers so far. One used 0.5, the other used 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1. I also searched Bioconductor archive and read that Dr. Robert Gentleman suggested to filter out the genes whose IQR below median, not for some fixed value. I have two questions on this vein. (1) How small is a gene's variance (in terms of number) if its IQR is some value, say, 0.5 or 0.1? Can I calculate it? (2) When median is used instead of fixed number, wouldn't it be too large, since median of a gene's expression intensities across samples can be anything? Thanks, Seungwoo ------------------------------------ Seungwoo Hwang, Ph.D. Senior Research Scientist Korean Bioinformation Center
Hi Seungwoo, The range/IQR/SE/SD of your data is dependent on a number of factors, including biological variability, and other sources of technical variability, which can include the type of normalisation algorithm (think RMA vs MAS5). Basically, applying a filter on IQR of 0.1 in my study might remove half the genes, whereas in your study it may remove 10% of them. Suggestions such as Robert's are useful because they use the IQR of YOUR data in order to set that cutoff. I suggest caculating the IQR's for all of your genes, and then either plotting them plot(density(IQRs)) or just try summary( IQRs ) which will give you a good feel for just how variable your data is. If you need help calculating the IQR's and/or variances of your genes, please post back to the list. cheers, Mark On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote: > I am wondering what cutoff value I should use for IQR filtering in > genefilter. I did some literature search. It varies from paper to > paper. I have read two papers so far. One used 0.5, the other used > 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1. > > I also searched Bioconductor archive and read that Dr. Robert > Gentleman suggested to filter out the genes whose IQR below median, > not for some fixed value. > > I have two questions on this vein. > > (1) How small is a gene's variance (in terms of number) if its IQR > is some value, say, 0.5 or 0.1? Can I calculate it? > (2) When median is used instead of fixed number, wouldn't it be too > large, since median of a gene's expression intensities across > samples can be anything? > > Thanks, > > Seungwoo > ------------------------------------ > Seungwoo Hwang, Ph.D. > Senior Research Scientist > Korean Bioinformation Center > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Mark Cowley, BSc (Bioinformatics)(Hons) Peter Wills Bioinformatics Centre Garvan Institute of Medical Research
