False Discovery Rate Questions

0

Entering edit mode

Sally ▴ 250

@sally-2430

Last seen 9.7 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080413/ eea0105e/attachment.pl

• 488 views

ADD COMMENT • link updated 16.1 years ago by James W. MacDonald 65k • written 16.1 years ago by Sally ▴ 250

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Hi Sally, Sally wrote: > We had a discussion in our lab about false discovery rate correction. > Most people felt a little disappointed with their microarray data > after in many cases a long hard slog with the lab process. The FDR > (BY, BH etc. etc.) knocked their un-adjusted gene lists into > very,very short lists of differentially expressed genes. I work on > disease pathology and thought microarrays would provide me with a > constellation of clues. Instead I find for some time periods I have > none. This is with a 17,500 cDNA custom chip. > > It surprised me that many knew of papers published without applying > an FDR adjustment. I had just assumed that for a proper statistical > analysis had to have one. If you are looking for a proper statistical analysis, you better stop using microarrays ;-D I think people put way too much stock in multiplicity adjustments. The goal of these adjustments is to limit the number of false positives in a set of 'significant' genes. However, to my knowledge all multiplicity adjustments are monotonic, so the base ranking of the genes will never change. So let's say you figure you can reasonably validate 50 genes. If you do the FDR adjustment and end up with 5000 genes at a 5% FDR, does that really help you? You will either take the top 50 and validate, or scan the top of the list for 'interesting' genes (I call this an eyeballometric analysis) and then validate those. Or maybe you will try to subset the significant genes by some interesting process or pathway. Regardless, the fact that you have a 5% FDR will likely be irrelevant. On the other hand, let's say you have no genes at a 5% FDR. This doesn't mean there are no significant genes; instead it means that you don't have the power to detect differences. But the genes at the top of the list are (according to the data in hand) the most likely to validate, so you could just take the top 50 and validate. Or you could not adjust and take however many have an unadjusted p-value of 0.005 or whatever. Best, Jim > > I wonder what people are doing regarding using/not using an FDR? > > What FDR corrections are people using? > > What newer alternatives are there that are less conservative? > > Sally Goldes [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 16.1 years ago James W. MacDonald 65k

0

Entering edit mode

James W. MacDonald wrote: > > I think people put way too much stock in multiplicity adjustments. Being a statistician myself I probably shouldn't say this, but there is some truth in this statement. It seems we statisticians have been quite successful to convince biologists that multiple testing is an issue that needs to be addressed. Unfortunately the main message that seems to have come across is: "You should never mention a gene in paper or do any further research into it, unless its FDR value is below 5%", which creates the very understandable frustration Sally describes in her e-mail. My advice to biologist in situation like this is a) Look at the p-value distribution!If its skewed towards small p-values the experiment has picked up something, even if it is not strong enough to give you many genes with and FDR-adjusted p below 5%. b) Choose the cut-off depending on what you want to do after the microarray experiment. If the experiment is the end of the story and you only want to publish the list of most changed genes, you really want to be sure, so in that case an FDR below 5% is an appropriate criterion (any referee with some sense should criticize you otherwise. If you find a lot of papers with lists of unadjusted p-values, this doesn't mean that this is "good practice", rather that they "got away with it"). If however you want to choose candidate genes which you will study in follow-up experiments you might very well be willing to accept that 20% or 50% of them are false positives. c) Take biological knowledge into account. For example: if a gene has an FDR value of 50% but it was one you expected to change or it has already been found to change in other simular studies, than it will obviously strengthen your result. A more coordinated way of utilsing biological knowledge is not to analyse single genes but gene sets/pathways with one of the many gene set analysis tools (GSEA, GlobalTest, cf. also Sean Davis's posting "[BioC] combining p-values and independent genes stouffer" to this list today.) A single gene may show some indication of being changed but not enough to jump the FDR<5% hurdle, but if many other related genes show a similar change the overall result might be highly significant. As this is an issue I have to discuss a lot with biologist I collaborate with, I would be interested to hear how other people on the list see this. Cheers Claus -- ********************************************************************** ************* Dr Claus-D. Mayer | http://www.bioss.ac.uk Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk Rowett Research Institute | Telephone: +44 (0) 1224 716652 Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349

ADD REPLY • link 16.1 years ago Claus Mayer ▴ 340

Login before adding your answer.