Filtering is not recommended with LIMMA?

0

Entering edit mode

Garcia Orellana,Miriam ▴ 150

@garcia-orellanamiriam-5283

Last seen 9.6 years ago

Dear Dr. Smyth. Would you be that kind to help me on deciding whether yes or no to filter my microarray data set with a filtering method correcting for variance such as I/NI method from Talloen et al. (2007). Whereas many researchers say that filtering should increase the power of the test, then increasing the chance to get true deferentially expressed genes. However when I analyzed my data set. I found the next: (meaning lower number of DEG when filtering). Ortoghonal contrasts # of genes (adjustedP >0.05 and FC >1.4) w/o filtering I/NI filtering FAT 195 118 FA 329 151 MR 169 103 FAT by MR 854 321 FA by MR 961 283 Also, I found that Bourgon et al. (2010) do not recommend to combine the use of limma t-statistic with filtering. So please, I will appreciate your suggestion on whether filter or not filter my data set. Thanks in advance. Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida [[alternative HTML version deleted]]

Microarray limma Microarray limma • 2.6k views

ADD COMMENT • link updated 10.9 years ago by Gordon Smyth 50k • written 10.9 years ago by Garcia Orellana,Miriam ▴ 150

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 16 days ago

EMBL European Molecular Biology Laborat…

Miriam: To clarify: Bourgon et al. (2010) discourage the use of the limma t-statistic specifically with overall-variance filtering, since this invalidates type-I error control of the combined procedure. Either component by itself (limma t; or overall-variance filtering and normal t) is fine. Nobody yet seems to have worked out how to combine them (and it may not be worthwhile.) I leave it to others to comment on I/NI filtering and limma. Also, as Jelle Goeman notes, by combining the threshold on adjustedP and on FC (>1.4) you are being anti-conservative. This in combination with your filtering likely explains the effect you see. Bottomline, by combing three criteria: - limma-t - I/NI - FC cutoff you are putting yourself into a difficult area of statistics, and unless you really know what you are doing, it might be best to deconvolute your criteria. Best wishes Wolfgang On 21 May 2013, at 03:06, "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> wrote: > Dear Dr. Smyth. > > Would you be that kind to help me on deciding whether yes or no to filter my microarray data set with a filtering method correcting for variance such as I/NI method from Talloen et al. (2007). Whereas many researchers say that filtering should increase the power of the test, then increasing the chance to get true deferentially expressed genes. However when I analyzed my data set. I found the next: (meaning lower number of DEG when filtering). > > > > > Ortoghonal contrasts # of genes > (adjustedP >0.05 and FC >1.4) > w/o filtering I/NI filtering > FAT 195 118 > FA 329 151 > MR 169 103 > FAT by MR 854 321 > FA by MR 961 283 > > Also, I found that Bourgon et al. (2010) do not recommend to combine the use of limma t-statistic with filtering. So please, I will appreciate your suggestion on whether filter or not filter my data set. > > Thanks in advance. > Miriam > > > > ******************************** > Miriam Garcia, MS, PhD > Department of Animal Sciences > University of Florida > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 7 minutes ago

WEHI, Melbourne, Australia

Dear Miriam, I don't know what I/NI filtering is and it isn't really my job to make a running commentary on every filtering method that gets published. However the limma algorithm analyses the spread of the genewise variances. Any filtering method based on genewise variances will change the distribution of variances, will interfere with the limma algorithm and hence will give poor results. Like most people, I recommend filtering out genes that don't appear to be expressed in any sample. See for example Case studies 15.3 or 15.4 in the limma User's Guide. However you will find if you use eBayes(fit,trend=TRUE) instead of the usual eBayes(fit) that limma gives pretty good results regardless how much filtering you do, provided of course that the filtering is on expression and not on variance. The literature tends to say that the reason for filtering is to reduce the amount of multiple testing, but in truth the increase in power from this is only slight. The more important reason for filtering in most applications is to remove highly variable genes at low intensities. The importance of filtering is highly dependent on how you pre-processed your data. Filtering is less important if you (i) use a good background correction or normalising method that damps down variability at low intensities and (ii) use eBayes(trend=TRUE) which accommodates a mean-variance trend. Best wishes Gordon > On 21 May 2013, at 03:06, "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> wrote: > >> Dear Dr. Smyth. >> >> Would you be that kind to help me on deciding whether yes or no to >> filter my microarray data set with a filtering method correcting for >> variance such as I/NI method from Talloen et al. (2007). Whereas many >> researchers say that filtering should increase the power of the test, >> then increasing the chance to get true deferentially expressed genes. >> However when I analyzed my data set. I found the next: (meaning lower >> number of DEG when filtering). >> >> >> Ortoghonal contrasts # of genes >> (adjustedP >0.05 and FC >1.4) >> w/o filtering I/NI filtering >> FAT 195 118 >> FA 329 151 >> MR 169 103 >> FAT by MR 854 321 >> FA by MR 961 283 >> >> Also, I found that Bourgon et al. (2010) do not recommend to combine >> the use of limma t-statistic with filtering. So please, I will >> appreciate your suggestion on whether filter or not filter my data set. >> >> Thanks in advance. >> Miriam >> >> >> ******************************** >> Miriam Garcia, MS, PhD >> Department of Animal Sciences >> University of Florida ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 10.9 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Dr. Smyth. Thanks for your explanation. I have checked the example you provide in LIMMA user guide # 15.4. which is using agilent arrays. In there I found that you are using a "new" at least for me, normalization method (> y <- backgroundCorrect(x,method="normexp") and > y <- normalizeBetweenArrays(y,method="quantile"), followed by the filtering method using the trend = true, that you suggested in your first reply. I have been using GCRMA as normalization method. So I am wonder if I could still use the true/false filtering method with GCRMA. Also I tried to look for some people that requested/published code when using affymetrix array instead of agilent to perform the same analysis as in # 15.4, and I couldn't find that, Does that mean that it do not work for affymetrix, I guess I am wrong. Thank you very much indeed. Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Wednesday, May 22, 2013 7:37 PM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: Filtering is not recommended with LIMMA? Dear Miriam, I don't know what I/NI filtering is and it isn't really my job to make a running commentary on every filtering method that gets published. However the limma algorithm analyses the spread of the genewise variances. Any filtering method based on genewise variances will change the distribution of variances, will interfere with the limma algorithm and hence will give poor results. Like most people, I recommend filtering out genes that don't appear to be expressed in any sample. See for example Case studies 15.3 or 15.4 in the limma User's Guide. However you will find if you use eBayes(fit,trend=TRUE) instead of the usual eBayes(fit) that limma gives pretty good results regardless how much filtering you do, provided of course that the filtering is on expression and not on variance. The literature tends to say that the reason for filtering is to reduce the amount of multiple testing, but in truth the increase in power from this is only slight. The more important reason for filtering in most applications is to remove highly variable genes at low intensities. The importance of filtering is highly dependent on how you pre-processed your data. Filtering is less important if you (i) use a good background correction or normalising method that damps down variability at low intensities and (ii) use eBayes(trend=TRUE) which accommodates a mean-variance trend. Best wishes Gordon > On 21 May 2013, at 03:06, "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> wrote: > >> Dear Dr. Smyth. >> >> Would you be that kind to help me on deciding whether yes or no to >> filter my microarray data set with a filtering method correcting for >> variance such as I/NI method from Talloen et al. (2007). Whereas many >> researchers say that filtering should increase the power of the test, >> then increasing the chance to get true deferentially expressed genes. >> However when I analyzed my data set. I found the next: (meaning lower >> number of DEG when filtering). >> >> >> Ortoghonal contrasts # of genes >> (adjustedP >0.05 and FC >1.4) >> w/o filtering I/NI filtering >> FAT 195 118 >> FA 329 151 >> MR 169 103 >> FAT by MR 854 321 >> FA by MR 961 283 >> >> Also, I found that Bourgon et al. (2010) do not recommend to combine >> the use of limma t-statistic with filtering. So please, I will >> appreciate your suggestion on whether filter or not filter my data set. >> >> Thanks in advance. >> Miriam >> >> >> ******************************** >> Miriam Garcia, MS, PhD >> Department of Animal Sciences >> University of Florida ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 10.9 years ago Garcia Orellana,Miriam ▴ 150

0

Entering edit mode

Dear Gordon > The literature tends to say that the reason for filtering is to reduce the amount of multiple testing, but in truth the increase in power from this is only slight. The more important reason for filtering in most applications is to remove highly variable genes at low intensities. The importance of filtering is highly dependent on how you pre-processed your data. Filtering is less important if you (i) use a good background correction or normalising method that damps down variability at low intensities and (ii) use eBayes(trend=TRUE) which accommodates a mean-variance trend. With all respect, I think this paragraph mixes up two separate issues and can benefit from clarification. 1. While literature can probably be found to support any statement, the above-cited reason is indeed bogus when multiple testing is performed with an FDR objective. The paper by Bourgon et al. motivates filtering differently, namely by using a filter criterion that is independent of the test statistic under the null (thus does not affect type-I error; some subtlety is discussed in that paper) but dependent under the alternative (thus improves power). 2. "Highly variable genes at low intensities" are indeed a problem of bad preprocessing and are better dealt with at that level, not by filtering. Nowadays, the commonly used methods for expression microarray or RNA-Seq analysis that I am aware of avoid that problem. 3. The question when & how independent filtering (as in 1) is beneficial is quite unrelated to preprocessing. You are right that FDR is a property of the whole selected gene list, not of individual genes, and that different approaches exist for spending the type-I error budget wisely, by weighting different genes differently; of which independent filtering is one and trended eBayes (which is not the default option in limma) may be another. Best wishes Wolfgang Reference: Bourgon et al. PNAS 2010: http://www.pnas.org/content/107/21/9546

ADD REPLY • link 10.9 years ago Wolfgang Huber ★ 13k

Login before adding your answer.