Search
Question: conceptual question about FDR, FDR adjusted p-value and q-value
3
gravatar for Jack Luo
4.9 years ago by
Jack Luo340
Jack Luo340 wrote:

Hi,

I am a bit confused about the concepts of the 3 things: FDR, FDR adjusted p-value and q-value, which I initially thought I was clear about.

Are FDR adjusted p-value the same as q-value? (my understanding is that FDR adjusted p-value = original p-value * number of genes/rank of the gene, is that right?) When people say xxx genes are differentially expressed with an FDR cutoff of 0.05, does that mean xxx genes have an FDR adjusted p-value smaller than 0.05?

Thanks,

-Jack

ADD COMMENTlink modified 13 months ago by Gordon Smyth32k • written 4.9 years ago by Jack Luo340
14
gravatar for Gordon Smyth
4.9 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Dear Jack,

The thing to understand is that terms like FDR and q-value were defined in specific ways by their original inventors but are used in more generic ways by later researchers who adapt, modify or use the ideas.

The term "false discovery rate (FDR)" was created by Benjamini and Hochberg in their 1995 paper. They gave a particular definition of what they meant by FDR.  Their procedure accepted or rejected hypotheses, but did not produce adjusted p-values.

Benjamini and Yekutieli presented another more conservative algorithm to control the FDR in a 2001 paper. Same definition of FDR, but a different algorithm.

In 2002, I re-interpreted the Benjamini and Hochberg (BH) and Benjamini and Yekutieli (BY) procedures in terms of adjusted p-values. I implemented the resulting algorithms in the function p.adjust() in the stats package, and used them in the limma package, and this lead to the concept of an FDR adjusted p-value. The terminology used by the
p.adjust() function and limma packages has lead people to refer to "BH adjusted p-values".

The adjusted p-value definition that you give is essentially the same as the BH adjusted p-value, except that you omitted the last step in the procedure. Your definition as it stands is not an increasing function of the original p-values.

In 2002, John Storey created a new definition of "false discovery rate". Storey's definition is based on Benjamini and Hochberg's original idea, but is mathematically a bit more flexible. John Storey also created the terminology "q-value" for a quantity estimates his definition of FDR. He implemented q-value estimation procedures in an R package called qvalue.

Another important but often overlooked difference is the idea of FDR "estimation" vs FDR "control". The qvalue package attempts to give a more or less unbiased estimate of the FDR, so the true FDR is about equally likely to be greater or less in practice. The BH approach instead controls the expected FDR. It guarantees that the true FDR rate will be less than the specified rate on average if you do an exactly similar experiment over and over again. So the BH approach is slightly more conservative than qvalue. The BH properties hold regardless of the number of p-values, while qvalue is asymptotic, so the BH approach is more robust than qvalue when the number of hypotheses being tested isn't very large.

So, strictly speaking, the q-value and the FDR adjusted p-value are similar but not quite the same. However the terms q-value and FDR adjusted p-value are often used generically by the Bioconductor community to refer to any quantity that controls or estimates any definition of the FDR. In this general sense the terms are synonyms.

The lesson to draw from this is that different methods and different packages are trying to do slighty different things and give slightly different results, and you should always cite the specific software and method that you have used.

Best wishes
Gordon

ADD COMMENTlink modified 4 months ago • written 4.9 years ago by Gordon Smyth32k
3
gravatar for Tim Triche
4.9 years ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:
p-value = extremal probability for a test statistic under the null hypothesis, not accounting for multiple comparisons BH p-value, pBH = extremal probability for the same, after accounting for multiple comparisons to upper-bound the overall false positive rate at <= p q-value = direct estimate of the FDR associated with pBH see http://genomics.princeton.edu/storeylab/papers/directfdr.pdf for the original, and quite well written paper, where on page 485, The basic point that we make is that using the Benjamini and Hochberg (1995) method to control FDR at level á=ð0 is equivalent to (i.e. rejects the same p-values as) using the proposed method to control FDR at level á. The gain in power from our approach is clear--we control a smaller error rate (á á=ð0), yet reject the same number of tests. q-values depend also on the estimated fraction of test p-values in the chance or uniform component of the distribution at some pFDR p. pi0 = estimated probability (overall) of a given result being truly null (i.e., false positive) at p | FDR q - value = BH p-value * pi0 (probability that test t incorrectly rejects the null at pBH) So q = pBH * pi0 (++) as can be verified from the output, and directly estimates the pFDR for test t assuming independence among the tests. The mathematical justification for this is given in the paper; the basic machinery can be, and has been, extended to many other situations. (++) If pi0 is estimated as at or very near 1.0, then pBH and q will be the same for any given test t, to the limit of machine precision (see paper). At least that's how it appears to be implemented last time I looked at the code and the paper :-) On Wed, Dec 19, 2012 at 7:22 AM, Jack Luo <jluo.rhelp@gmail.com> wrote: > Hi, > > I am a bit confused about the concepts of the 3 things: FDR, FDR adjusted > p-value and q-value, which I initially thought I was clear about. > > Are FDR adjusted p-value the same as q-value? (my understanding is that FDR > adjusted p-value = original p-value * number of genes/rank of the gene, is > that right?) > When people say xxx genes are differentially expressed with an FDR cutoff > of 0.05, does that mean xxx genes have an FDR adjusted p-value smaller than > 0.05? > > Thanks, > > -Jack > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENTlink written 4.9 years ago by Tim Triche4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 325 users visited in the last hour