Search
Question: limma - FDR adjusted "p-values"
0
gravatar for Gordon Smyth
12.8 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:
> Date: Mon, 31 Jan 2005 09:56:09 -0500 > From: Naomi Altman <naomi@stat.psu.edu> > Subject: [BioC] limma - FDR adjusted "p-values" > To: bioconductor@stat.math.ethz.ch > > Just a suggestion: > > The FDR adjusted "p-values" are called "q-values" in much of the > literature. I suggest that limma follow suit, It's certainly true that a lot of users have trouble with FDR and with adjusted p-values in general. Perhaps you're right that limma should use the term "q-values". This would associate p-values with control/estimation of FWER and q-values with control/estimation of FDR. The reason I haven't this so far is because the term "q-value" coined by John Storey seems to me to measure something slightly different to Benjamini and Hocherg adjusted p-values. I think that John Storey's q-value uses a slightly different definition of false discovery rate, namely pFDR, the positive false rate. Also I think it usually estimates pFDR rather than formally controlling it. Although there is a value "Q" which appears in Benjamin and Hochberg's formulations, and it is closely related to q-values, it is not exactly the same. So I have been reluctant to use the term "q-value" for things which were not quite the same, as this would cloud the fine meaning of the term. Perhaps I am splitting hairs here and should just accept the broad definition of q-value for FDR or pFDR and p-value for FWER. Any other opinions? I have also thought that perhaps topTable() should label the p-value/q-value column in the output to indicate which adjustment method was used to generate the table. > and also add a line to the > documentation (it might already be there and I missed it) > > "If the number of significant results at level alpha is less than > alpha*(number of genes), then the q-value will be 1.0." > > It seems like I have to explain this to just about every investigator who > runs into this. I get a lot of questions about this as well. Actually, the statement you've made isn't always true, although it usually is. Even if the smallest p-value out of n genes is only as small as 1/n, the "fdr" adjusted p-value is not always 1. It can be as small as 1/n depending on the other n-1 p-values. Perhaps the way to go would be for topTable() to output the raw p-values as well as the adjusted p-values/q-values. I haven't done this so as to keep the table as small as possible, but it would prevent users from being presented with just a list of p-values all equal to 1. What do you think? Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111
ADD COMMENTlink modified 12.8 years ago by Naomi Altman6.0k • written 12.8 years ago by Gordon Smyth32k
0
gravatar for Sean Davis
12.8 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Feb 1, 2005, at 7:30 AM, Gordon K Smyth wrote: >> Date: Mon, 31 Jan 2005 09:56:09 -0500 >> From: Naomi Altman <naomi@stat.psu.edu> >> Subject: [BioC] limma - FDR adjusted "p-values" >> To: bioconductor@stat.math.ethz.ch >> >> Just a suggestion: >> >> The FDR adjusted "p-values" are called "q-values" in much of the >> literature. I suggest that limma follow suit, > > It's certainly true that a lot of users have trouble with FDR and with > adjusted p-values in > general. Perhaps you're right that limma should use the term > "q-values". This would associate > p-values with control/estimation of FWER and q-values with > control/estimation of FDR. > > The reason I haven't this so far is because the term "q-value" coined > by John Storey seems to me > to measure something slightly different to Benjamini and Hocherg > adjusted p-values. I think that > John Storey's q-value uses a slightly different definition of false > discovery rate, namely pFDR, > the positive false rate. Also I think it usually estimates pFDR > rather than formally controlling > it. Although there is a value "Q" which appears in Benjamin and > Hochberg's formulations, and it > is closely related to q-values, it is not exactly the same. So I > have been reluctant to use the > term "q-value" for things which were not quite the same, as this would > cloud the fine meaning of > the term. Perhaps I am splitting hairs here and should just accept > the broad definition of > q-value for FDR or pFDR and p-value for FWER. Any other opinions? > > I have also thought that perhaps topTable() should label the > p-value/q-value column in the output > to indicate which adjustment method was used to generate the table. > I think the latter (label the p-value/q-value column) would suffice and be the most general solution. Unfortunately, FDR is foreign to many researchers, so it demands an explanation by someone in-the-know, no matter what. I'm not sure that calling a p-value a different name will satisfy the need for researchers to know the quantity that summarizes their data. In short, I see the labeling issue as separate from the FDR understanding issue. Is that fair? Sean >> and also add a line to the >> documentation (it might already be there and I missed it) >> >> "If the number of significant results at level alpha is less than >> alpha*(number of genes), then the q-value will be 1.0." >> >> It seems like I have to explain this to just about every investigator >> who >> runs into this. > > I get a lot of questions about this as well. Actually, the statement > you've made isn't always > true, although it usually is. Even if the smallest p-value out of n > genes is only as small as > 1/n, the "fdr" adjusted p-value is not always 1. It can be as small > as 1/n depending on the other > n-1 p-values. > > Perhaps the way to go would be for topTable() to output the raw > p-values as well as the adjusted > p-values/q-values. I haven't done this so as to keep the table as > small as possible, but it would > prevent users from being presented with just a list of p-values all > equal to 1. What do you > think? > > Gordon > >> Naomi S. Altman 814-865-3791 (voice) >> Associate Professor >> Bioinformatics Consulting Center >> Dept. of Statistics 814-863-7114 (fax) >> Penn State University 814-865-1348 >> (Statistics) >> University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENTlink written 12.8 years ago by Sean Davis21k
0
gravatar for michael watson IAH-C
12.8 years ago by
michael watson IAH-C3.4k wrote:
We use limma a lot, and from our point of view having both adjusted and unadjusted p-values in the topTable() output would be beneficial. Thanks Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Gordon K Smyth Sent: 01 February 2005 12:31 To: Naomi Altman Cc: jstorey@u.washington.edu; bioconductor@stat.math.ethz.ch Subject: [BioC] limma - FDR adjusted "p-values" > Date: Mon, 31 Jan 2005 09:56:09 -0500 > From: Naomi Altman <naomi@stat.psu.edu> > Subject: [BioC] limma - FDR adjusted "p-values" > To: bioconductor@stat.math.ethz.ch > > Just a suggestion: > > The FDR adjusted "p-values" are called "q-values" in much of the > literature. I suggest that limma follow suit, It's certainly true that a lot of users have trouble with FDR and with adjusted p-values in general. Perhaps you're right that limma should use the term "q-values". This would associate p-values with control/estimation of FWER and q-values with control/estimation of FDR. The reason I haven't this so far is because the term "q-value" coined by John Storey seems to me to measure something slightly different to Benjamini and Hocherg adjusted p-values. I think that John Storey's q-value uses a slightly different definition of false discovery rate, namely pFDR, the positive false rate. Also I think it usually estimates pFDR rather than formally controlling it. Although there is a value "Q" which appears in Benjamin and Hochberg's formulations, and it is closely related to q-values, it is not exactly the same. So I have been reluctant to use the term "q-value" for things which were not quite the same, as this would cloud the fine meaning of the term. Perhaps I am splitting hairs here and should just accept the broad definition of q-value for FDR or pFDR and p-value for FWER. Any other opinions? I have also thought that perhaps topTable() should label the p-value/q-value column in the output to indicate which adjustment method was used to generate the table. > and also add a line to the > documentation (it might already be there and I missed it) > > "If the number of significant results at level alpha is less than > alpha*(number of genes), then the q-value will be 1.0." > > It seems like I have to explain this to just about every investigator > who runs into this. I get a lot of questions about this as well. Actually, the statement you've made isn't always true, although it usually is. Even if the smallest p-value out of n genes is only as small as 1/n, the "fdr" adjusted p-value is not always 1. It can be as small as 1/n depending on the other n-1 p-values. Perhaps the way to go would be for topTable() to output the raw p-values as well as the adjusted p-values/q-values. I haven't done this so as to keep the table as small as possible, but it would prevent users from being presented with just a list of p-values all equal to 1. What do you think? Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENTlink written 12.8 years ago by michael watson IAH-C3.4k
0
gravatar for Naomi Altman
12.8 years ago by
Naomi Altman6.0k
Naomi Altman6.0k wrote:
I think it would be useful to have both the p-values and the "q-values". The "q-values" should not be called "adjusted p-values" because they are not probabilities. They are the estimated FDR at the largest p-value for which the gene would be statistically significant. Perhaps they should be called "fdr-values". My vote is for Gordon to invent a name and then use it. As LIMMA becomes more popular, the terminology will migrate to popular usage. Cheers, Naomi At 07:30 AM 2/1/2005, Gordon K Smyth wrote: > > Date: Mon, 31 Jan 2005 09:56:09 -0500 > > From: Naomi Altman <naomi@stat.psu.edu> > > Subject: [BioC] limma - FDR adjusted "p-values" > > To: bioconductor@stat.math.ethz.ch > > > > Just a suggestion: > > > > The FDR adjusted "p-values" are called "q-values" in much of the > > literature. I suggest that limma follow suit, > >It's certainly true that a lot of users have trouble with FDR and with >adjusted p-values in >general. Perhaps you're right that limma should use the term >"q-values". This would associate >p-values with control/estimation of FWER and q-values with >control/estimation of FDR. > >The reason I haven't this so far is because the term "q-value" coined by >John Storey seems to me >to measure something slightly different to Benjamini and Hocherg adjusted >p-values. I think that >John Storey's q-value uses a slightly different definition of false >discovery rate, namely pFDR, >the positive false rate. Also I think it usually estimates pFDR rather >than formally controlling >it. Although there is a value "Q" which appears in Benjamin and >Hochberg's formulations, and it >is closely related to q-values, it is not exactly the same. So I have >been reluctant to use the >term "q-value" for things which were not quite the same, as this would >cloud the fine meaning of >the term. Perhaps I am splitting hairs here and should just accept the >broad definition of >q-value for FDR or pFDR and p-value for FWER. Any other opinions? > >I have also thought that perhaps topTable() should label the >p-value/q-value column in the output >to indicate which adjustment method was used to generate the table. > > > and also add a line to the > > documentation (it might already be there and I missed it) > > > > "If the number of significant results at level alpha is less than > > alpha*(number of genes), then the q-value will be 1.0." > > > > It seems like I have to explain this to just about every investigator who > > runs into this. > >I get a lot of questions about this as well. Actually, the statement >you've made isn't always >true, although it usually is. Even if the smallest p-value out of n genes >is only as small as >1/n, the "fdr" adjusted p-value is not always 1. It can be as small as >1/n depending on the other >n-1 p-values. > >Perhaps the way to go would be for topTable() to output the raw p-values >as well as the adjusted >p-values/q-values. I haven't done this so as to keep the table as small >as possible, but it would >prevent users from being presented with just a list of p-values all equal >to 1. What do you >think? > >Gordon > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD COMMENTlink written 12.8 years ago by Naomi Altman6.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 354 users visited in the last hour