conceptual question about FDR, FDR adjusted p-value and q-value
2
12
Entering edit mode
Jack Luo ▴ 430
@jack-luo-4241
Last seen 8.1 years ago

Hi,

I am a bit confused about the concepts of the 3 things: FDR, FDR adjusted p-value and q-value, which I initially thought I was clear about.

Are FDR adjusted p-value the same as q-value? (my understanding is that FDR adjusted p-value = original p-value * number of genes/rank of the gene, is that right?) When people say xxx genes are differentially expressed with an FDR cutoff of 0.05, does that mean xxx genes have an FDR adjusted p-value smaller than 0.05?

Thanks,

-Jack

fdr qvalue • 48k views
0
Entering edit mode

The FDR adjusted p-value is not equal to "original p-value * number of genes/rank of the gene"

0
Entering edit mode

Why? so is it possible for you to briefly describe the calculation of the FDR adjusted p-value? Thank you!

40
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

Dear Jack,

The thing to understand is that terms like FDR and q-value were defined in specific ways by their original inventors but are used in more generic ways by later researchers who adapt, modify or use the ideas.

The term "false discovery rate (FDR)" was created by Benjamini and Hochberg in their 1995 paper. They gave a particular definition of what they meant by FDR.  Their procedure accepted or rejected hypotheses, but did not produce adjusted p-values.

Benjamini and Yekutieli presented another more conservative algorithm to control the FDR in a 2001 paper. Same definition of FDR, but a different algorithm.

In 2002, I re-interpreted the Benjamini and Hochberg (BH) and Benjamini and Yekutieli (BY) procedures in terms of adjusted p-values. I implemented the resulting algorithms in the function p.adjust() in the stats package, and used them in the limma package, and this lead to the concept of an FDR adjusted p-value. The terminology used by the p.adjust() function and limma packages has lead people to refer to "BH adjusted p-values".

The adjusted p-value definition that you give is essentially the same as the BH adjusted p-value, except that you omitted the last step in the procedure. Your definition as it stands is not an increasing function of the original p-values.

In 2002, John Storey created a new definition of "false discovery rate". Storey's definition is based on Benjamini and Hochberg's original idea, but is mathematically a bit more flexible. John Storey also created the terminology "q-value" for a quantity that estimates his definition of FDR. He implemented q-value estimation procedures in an R package called qvalue.

Another important but often overlooked difference is the idea of FDR "estimation" vs FDR "control". The qvalue package attempts to give a more or less unbiased estimate of the FDR, so the true FDR is about equally likely to be greater or less in practice. The BH approach instead controls the expected FDR. It guarantees that the true FDR rate will be less than the specified rate on average if you do an exactly similar experiment over and over again. So the BH approach is slightly more conservative than qvalue. The BH properties hold regardless of the number of p-values, while qvalue is asymptotic, so the BH approach is more robust than qvalue when the number of hypotheses being tested isn't very large.

So, strictly speaking, the q-value and the FDR adjusted p-value are similar but not quite the same. However the terms q-value and FDR adjusted p-value are often used generically by the Bioconductor community to refer to any quantity that controls or estimates any definition of the FDR. In this general sense the terms are synonyms.

The lesson to draw from this is that different methods and different packages are trying to do slighty different things and give slightly different results, and you should always cite the specific software and method that you have used.

Best wishes
Gordon

0
Entering edit mode

So if my understanding is correct, when we report "FDR adjusted p-values" obtained from p.adjust(p, method = "BH"), citing Benjamini and Hochberg 2005 is not completely accurate? What should we cite? Thanks.

1
Entering edit mode

if you're citing the p.adjust(p, method = "BH"), the correct reference is Benjamini, Yoav, and Yosef Hochberg. "Controlling the false discovery rate: a practical and powerful approach to multiple testing." Journal of the Royal statistical society: series B (Methodological) 57.1 (1995): 289-300. Get the citation in your preferred format using Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=Controlling+the+false+discovery+rate%3A+a+practical+and+powerful+approach+to+multiple+testing&btnG=

Note that this citation is first paper listed in R's ?p.adjust help file.

0
Entering edit mode

I am not sure what paper you are refereing to by "Benjamini Hochberg 2005". I did not write up and publish my adjusted p-value formulation of BH's method in a specific journal publication, so I suggest that you simply cite BH's 1995 paper when you use p.adjust with method="BH". I wrote the algorithm so that the set of genes with adjusted p-value less than some preset FDR is exactly equivalent to choosing genes according to BH's 1995 procedure with the same cutoff.

8
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 2.1 years ago
United States
p-value = extremal probability for a test statistic under the null hypothesis, not accounting for multiple comparisons BH p-value, pBH = extremal probability for the same, after accounting for multiple comparisons to upper-bound the overall false positive rate at <= p q-value = direct estimate of the FDR associated with pBH see http://genomics.princeton.edu/storeylab/papers/directfdr.pdf for the original, and quite well written paper, where on page 485, The basic point that we make is that using the Benjamini and Hochberg (1995) method to control FDR at level á=ð0 is equivalent to (i.e. rejects the same p-values as) using the proposed method to control FDR at level á. The gain in power from our approach is clear--we control a smaller error rate (á á=ð0), yet reject the same number of tests. q-values depend also on the estimated fraction of test p-values in the chance or uniform component of the distribution at some pFDR p. pi0 = estimated probability (overall) of a given result being truly null (i.e., false positive) at p | FDR q - value = BH p-value * pi0 (probability that test t incorrectly rejects the null at pBH) So q = pBH * pi0 (++) as can be verified from the output, and directly estimates the pFDR for test t assuming independence among the tests. The mathematical justification for this is given in the paper; the basic machinery can be, and has been, extended to many other situations. (++) If pi0 is estimated as at or very near 1.0, then pBH and q will be the same for any given test t, to the limit of machine precision (see paper). At least that's how it appears to be implemented last time I looked at the code and the paper :-) On Wed, Dec 19, 2012 at 7:22 AM, Jack Luo <jluo.rhelp@gmail.com> wrote: > Hi, > > I am a bit confused about the concepts of the 3 things: FDR, FDR adjusted > p-value and q-value, which I initially thought I was clear about. > > Are FDR adjusted p-value the same as q-value? (my understanding is that FDR > adjusted p-value = original p-value * number of genes/rank of the gene, is > that right?) > When people say xxx genes are differentially expressed with an FDR cutoff > of 0.05, does that mean xxx genes have an FDR adjusted p-value smaller than > 0.05? > > Thanks, > > -Jack > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
0
Entering edit mode

Thanks for the answer Tim.