Cutoff to use for IQR filtering in genefilter
1
0
Entering edit mode
@seungwoo-hwang-2520
Last seen 10.2 years ago
I am wondering what cutoff value I should use for IQR filtering in genefilter. I did some literature search. It varies from paper to paper. I have read two papers so far. One used 0.5, the other used 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1. I also searched Bioconductor archive and read that Dr. Robert Gentleman suggested to filter out the genes whose IQR below median, not for some fixed value. I have two questions on this vein. (1) How small is a gene's variance (in terms of number) if its IQR is some value, say, 0.5 or 0.1? Can I calculate it? (2) When median is used instead of fixed number, wouldn't it be too large, since median of a gene's expression intensities across samples can be anything? Thanks, Seungwoo ------------------------------------ Seungwoo Hwang, Ph.D. Senior Research Scientist Korean Bioinformation Center
affylmGUI affylmGUI • 3.6k views
ADD COMMENT
0
Entering edit mode
Mark Cowley ▴ 400
@mark-cowley-2858
Last seen 9.2 years ago
Australia
Hi Seungwoo, The range/IQR/SE/SD of your data is dependent on a number of factors, including biological variability, and other sources of technical variability, which can include the type of normalisation algorithm (think RMA vs MAS5). Basically, applying a filter on IQR of 0.1 in my study might remove half the genes, whereas in your study it may remove 10% of them. Suggestions such as Robert's are useful because they use the IQR of YOUR data in order to set that cutoff. I suggest caculating the IQR's for all of your genes, and then either plotting them plot(density(IQRs)) or just try summary( IQRs ) which will give you a good feel for just how variable your data is. If you need help calculating the IQR's and/or variances of your genes, please post back to the list. cheers, Mark On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote: > I am wondering what cutoff value I should use for IQR filtering in > genefilter. I did some literature search. It varies from paper to > paper. I have read two papers so far. One used 0.5, the other used > 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1. > > I also searched Bioconductor archive and read that Dr. Robert > Gentleman suggested to filter out the genes whose IQR below median, > not for some fixed value. > > I have two questions on this vein. > > (1) How small is a gene's variance (in terms of number) if its IQR > is some value, say, 0.5 or 0.1? Can I calculate it? > (2) When median is used instead of fixed number, wouldn't it be too > large, since median of a gene's expression intensities across > samples can be anything? > > Thanks, > > Seungwoo > ------------------------------------ > Seungwoo Hwang, Ph.D. > Senior Research Scientist > Korean Bioinformation Center > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Mark Cowley, BSc (Bioinformatics)(Hons) Peter Wills Bioinformatics Centre Garvan Institute of Medical Research
ADD COMMENT
0
Entering edit mode
Hi Mark, Am I right in the interpretation that using the median cutoff of the distribution of IQRs would remove 50% of the genes in every analysis. As below: eset <- readAffy() IQRs <- esApply(eset,1,IQR) f1 <- function(x) ( IQR(x) > median(IQRs) ) selected <- genefilter(eset, f1) What happens if more than 50% of genes are variable or for that matter less than 50%? Should one plot the IQRs against some value of interest, e.g. t-test statistic and determine the IQR cut-off on that basis? Thanks, Fraser -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mark Cowley Sent: Sunday, June 22, 2008 7:32 PM To: swhwang10 at yahoo.com Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Cutoff to use for IQR filtering in genefilter Hi Seungwoo, The range/IQR/SE/SD of your data is dependent on a number of factors, including biological variability, and other sources of technical variability, which can include the type of normalisation algorithm (think RMA vs MAS5). Basically, applying a filter on IQR of 0.1 in my study might remove half the genes, whereas in your study it may remove 10% of them. Suggestions such as Robert's are useful because they use the IQR of YOUR data in order to set that cutoff. I suggest caculating the IQR's for all of your genes, and then either plotting them plot(density(IQRs)) or just try summary( IQRs ) which will give you a good feel for just how variable your data is. If you need help calculating the IQR's and/or variances of your genes, please post back to the list. cheers, Mark On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote: > I am wondering what cutoff value I should use for IQR filtering in > genefilter. I did some literature search. It varies from paper to > paper. I have read two papers so far. One used 0.5, the other used > 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1. > > I also searched Bioconductor archive and read that Dr. Robert > Gentleman suggested to filter out the genes whose IQR below median, > not for some fixed value. > > I have two questions on this vein. > > (1) How small is a gene's variance (in terms of number) if its IQR > is some value, say, 0.5 or 0.1? Can I calculate it? > (2) When median is used instead of fixed number, wouldn't it be too > large, since median of a gene's expression intensities across > samples can be anything? > > Thanks, > > Seungwoo > ------------------------------------ > Seungwoo Hwang, Ph.D. > Senior Research Scientist > Korean Bioinformation Center > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Mark Cowley, BSc (Bioinformatics)(Hons) Peter Wills Bioinformatics Centre Garvan Institute of Medical Research _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Fraser, that's exactly right, using the median IQR as the filter will remove 50% of your data every time. Other alternatives could be the 20th percentile of the IQR's as your filter to remove the least variable 20%. Since all of the IQR's make up a distribution of numbers, there will always be a median of that distribution. I think that the question you're asking is: what if the median IQR is still not variable enough in a biological context, or in a system with large changes, perhaps a median IQR filter would remove too many genes that have large variability. That would be where plotting the data, perhaps against the t-tests as you have suggested would be a good means of choosing the best filter. perhaps IQR vs average expression level, or IQR vs standard deviation might also help? Incidentally, I rarely use a variability filter, I rely on the statistics with FDR < 5%, and accept that some of these will be due to genes with small, but consistent differences. cheers, Mark On 24/06/2008, at 3:06 AM, Sim, Fraser wrote: > Hi Mark, > > Am I right in the interpretation that using the median cutoff of the > distribution of IQRs would remove 50% of the genes in every analysis. > > As below: > > eset <- readAffy() > IQRs <- esApply(eset,1,IQR) > f1 <- function(x) ( IQR(x) > median(IQRs) ) > selected <- genefilter(eset, f1) > > What happens if more than 50% of genes are variable or for that matter > less than 50%? Should one plot the IQRs against some value of > interest, > e.g. t-test statistic and determine the IQR cut-off on that basis? > > Thanks, Fraser > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mark > Cowley > Sent: Sunday, June 22, 2008 7:32 PM > To: swhwang10 at yahoo.com > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Cutoff to use for IQR filtering in genefilter > > Hi Seungwoo, > The range/IQR/SE/SD of your data is dependent on a number of factors, > including biological variability, and other sources of technical > variability, which can include the type of normalisation algorithm > (think RMA vs MAS5). > Basically, applying a filter on IQR of 0.1 in my study might remove > half the genes, whereas in your study it may remove 10% of them. > Suggestions such as Robert's are useful because they use the IQR of > YOUR data in order to set that cutoff. > > I suggest caculating the IQR's for all of your genes, and then either > plotting them plot(density(IQRs)) or just try summary( IQRs ) which > will give you a good feel for just how variable your data is. > > If you need help calculating the IQR's and/or variances of your genes, > please post back to the list. > > cheers, > Mark > > On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote: > >> I am wondering what cutoff value I should use for IQR filtering in >> genefilter. I did some literature search. It varies from paper to >> paper. I have read two papers so far. One used 0.5, the other used >> 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1. >> >> I also searched Bioconductor archive and read that Dr. Robert >> Gentleman suggested to filter out the genes whose IQR below median, >> not for some fixed value. >> >> I have two questions on this vein. >> >> (1) How small is a gene's variance (in terms of number) if its IQR >> is some value, say, 0.5 or 0.1? Can I calculate it? >> (2) When median is used instead of fixed number, wouldn't it be too >> large, since median of a gene's expression intensities across >> samples can be anything? >> >> Thanks, >> >> Seungwoo >> ------------------------------------ >> Seungwoo Hwang, Ph.D. >> Senior Research Scientist >> Korean Bioinformation Center >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------------------------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6