limma - FDR adjusted "p-values"
3
0
Entering edit mode
@gordon-smyth
Last seen 15 minutes ago
WEHI, Melbourne, Australia
> Date: Mon, 31 Jan 2005 09:56:09 -0500 > From: Naomi Altman <naomi@stat.psu.edu> > Subject: [BioC] limma - FDR adjusted "p-values" > To: bioconductor@stat.math.ethz.ch > > Just a suggestion: > > The FDR adjusted "p-values" are called "q-values" in much of the > literature. I suggest that limma follow suit, It's certainly true that a lot of users have trouble with FDR and with adjusted p-values in general. Perhaps you're right that limma should use the term "q-values". This would associate p-values with control/estimation of FWER and q-values with control/estimation of FDR. The reason I haven't this so far is because the term "q-value" coined by John Storey seems to me to measure something slightly different to Benjamini and Hocherg adjusted p-values. I think that John Storey's q-value uses a slightly different definition of false discovery rate, namely pFDR, the positive false rate. Also I think it usually estimates pFDR rather than formally controlling it. Although there is a value "Q" which appears in Benjamin and Hochberg's formulations, and it is closely related to q-values, it is not exactly the same. So I have been reluctant to use the term "q-value" for things which were not quite the same, as this would cloud the fine meaning of the term. Perhaps I am splitting hairs here and should just accept the broad definition of q-value for FDR or pFDR and p-value for FWER. Any other opinions? I have also thought that perhaps topTable() should label the p-value/q-value column in the output to indicate which adjustment method was used to generate the table. > and also add a line to the > documentation (it might already be there and I missed it) > > "If the number of significant results at level alpha is less than > alpha*(number of genes), then the q-value will be 1.0." > > It seems like I have to explain this to just about every investigator who > runs into this. I get a lot of questions about this as well. Actually, the statement you've made isn't always true, although it usually is. Even if the smallest p-value out of n genes is only as small as 1/n, the "fdr" adjusted p-value is not always 1. It can be as small as 1/n depending on the other n-1 p-values. Perhaps the way to go would be for topTable() to output the raw p-values as well as the adjusted p-values/q-values. I haven't done this so as to keep the table as small as possible, but it would prevent users from being presented with just a list of p-values all equal to 1. What do you think? Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111
GO limma GO limma • 2.9k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On Feb 1, 2005, at 7:30 AM, Gordon K Smyth wrote: >> Date: Mon, 31 Jan 2005 09:56:09 -0500 >> From: Naomi Altman <naomi@stat.psu.edu> >> Subject: [BioC] limma - FDR adjusted "p-values" >> To: bioconductor@stat.math.ethz.ch >> >> Just a suggestion: >> >> The FDR adjusted "p-values" are called "q-values" in much of the >> literature. I suggest that limma follow suit, > > It's certainly true that a lot of users have trouble with FDR and with > adjusted p-values in > general. Perhaps you're right that limma should use the term > "q-values". This would associate > p-values with control/estimation of FWER and q-values with > control/estimation of FDR. > > The reason I haven't this so far is because the term "q-value" coined > by John Storey seems to me > to measure something slightly different to Benjamini and Hocherg > adjusted p-values. I think that > John Storey's q-value uses a slightly different definition of false > discovery rate, namely pFDR, > the positive false rate. Also I think it usually estimates pFDR > rather than formally controlling > it. Although there is a value "Q" which appears in Benjamin and > Hochberg's formulations, and it > is closely related to q-values, it is not exactly the same. So I > have been reluctant to use the > term "q-value" for things which were not quite the same, as this would > cloud the fine meaning of > the term. Perhaps I am splitting hairs here and should just accept > the broad definition of > q-value for FDR or pFDR and p-value for FWER. Any other opinions? > > I have also thought that perhaps topTable() should label the > p-value/q-value column in the output > to indicate which adjustment method was used to generate the table. > I think the latter (label the p-value/q-value column) would suffice and be the most general solution. Unfortunately, FDR is foreign to many researchers, so it demands an explanation by someone in-the-know, no matter what. I'm not sure that calling a p-value a different name will satisfy the need for researchers to know the quantity that summarizes their data. In short, I see the labeling issue as separate from the FDR understanding issue. Is that fair? Sean >> and also add a line to the >> documentation (it might already be there and I missed it) >> >> "If the number of significant results at level alpha is less than >> alpha*(number of genes), then the q-value will be 1.0." >> >> It seems like I have to explain this to just about every investigator >> who >> runs into this. > > I get a lot of questions about this as well. Actually, the statement > you've made isn't always > true, although it usually is. Even if the smallest p-value out of n > genes is only as small as > 1/n, the "fdr" adjusted p-value is not always 1. It can be as small > as 1/n depending on the other > n-1 p-values. > > Perhaps the way to go would be for topTable() to output the raw > p-values as well as the adjusted > p-values/q-values. I haven't done this so as to keep the table as > small as possible, but it would > prevent users from being presented with just a list of p-values all > equal to 1. What do you > think? > > Gordon > >> Naomi S. Altman 814-865-3791 (voice) >> Associate Professor >> Bioinformatics Consulting Center >> Dept. of Statistics 814-863-7114 (fax) >> Penn State University 814-865-1348 >> (Statistics) >> University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 9.6 years ago
We use limma a lot, and from our point of view having both adjusted and unadjusted p-values in the topTable() output would be beneficial. Thanks Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Gordon K Smyth Sent: 01 February 2005 12:31 To: Naomi Altman Cc: jstorey@u.washington.edu; bioconductor@stat.math.ethz.ch Subject: [BioC] limma - FDR adjusted "p-values" > Date: Mon, 31 Jan 2005 09:56:09 -0500 > From: Naomi Altman <naomi@stat.psu.edu> > Subject: [BioC] limma - FDR adjusted "p-values" > To: bioconductor@stat.math.ethz.ch > > Just a suggestion: > > The FDR adjusted "p-values" are called "q-values" in much of the > literature. I suggest that limma follow suit, It's certainly true that a lot of users have trouble with FDR and with adjusted p-values in general. Perhaps you're right that limma should use the term "q-values". This would associate p-values with control/estimation of FWER and q-values with control/estimation of FDR. The reason I haven't this so far is because the term "q-value" coined by John Storey seems to me to measure something slightly different to Benjamini and Hocherg adjusted p-values. I think that John Storey's q-value uses a slightly different definition of false discovery rate, namely pFDR, the positive false rate. Also I think it usually estimates pFDR rather than formally controlling it. Although there is a value "Q" which appears in Benjamin and Hochberg's formulations, and it is closely related to q-values, it is not exactly the same. So I have been reluctant to use the term "q-value" for things which were not quite the same, as this would cloud the fine meaning of the term. Perhaps I am splitting hairs here and should just accept the broad definition of q-value for FDR or pFDR and p-value for FWER. Any other opinions? I have also thought that perhaps topTable() should label the p-value/q-value column in the output to indicate which adjustment method was used to generate the table. > and also add a line to the > documentation (it might already be there and I missed it) > > "If the number of significant results at level alpha is less than > alpha*(number of genes), then the q-value will be 1.0." > > It seems like I have to explain this to just about every investigator > who runs into this. I get a lot of questions about this as well. Actually, the statement you've made isn't always true, although it usually is. Even if the smallest p-value out of n genes is only as small as 1/n, the "fdr" adjusted p-value is not always 1. It can be as small as 1/n depending on the other n-1 p-values. Perhaps the way to go would be for topTable() to output the raw p-values as well as the adjusted p-values/q-values. I haven't done this so as to keep the table as small as possible, but it would prevent users from being presented with just a list of p-values all equal to 1. What do you think? Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 2.9 years ago
United States
I think it would be useful to have both the p-values and the "q-values". The "q-values" should not be called "adjusted p-values" because they are not probabilities. They are the estimated FDR at the largest p-value for which the gene would be statistically significant. Perhaps they should be called "fdr-values". My vote is for Gordon to invent a name and then use it. As LIMMA becomes more popular, the terminology will migrate to popular usage. Cheers, Naomi At 07:30 AM 2/1/2005, Gordon K Smyth wrote: > > Date: Mon, 31 Jan 2005 09:56:09 -0500 > > From: Naomi Altman <naomi@stat.psu.edu> > > Subject: [BioC] limma - FDR adjusted "p-values" > > To: bioconductor@stat.math.ethz.ch > > > > Just a suggestion: > > > > The FDR adjusted "p-values" are called "q-values" in much of the > > literature. I suggest that limma follow suit, > >It's certainly true that a lot of users have trouble with FDR and with >adjusted p-values in >general. Perhaps you're right that limma should use the term >"q-values". This would associate >p-values with control/estimation of FWER and q-values with >control/estimation of FDR. > >The reason I haven't this so far is because the term "q-value" coined by >John Storey seems to me >to measure something slightly different to Benjamini and Hocherg adjusted >p-values. I think that >John Storey's q-value uses a slightly different definition of false >discovery rate, namely pFDR, >the positive false rate. Also I think it usually estimates pFDR rather >than formally controlling >it. Although there is a value "Q" which appears in Benjamin and >Hochberg's formulations, and it >is closely related to q-values, it is not exactly the same. So I have >been reluctant to use the >term "q-value" for things which were not quite the same, as this would >cloud the fine meaning of >the term. Perhaps I am splitting hairs here and should just accept the >broad definition of >q-value for FDR or pFDR and p-value for FWER. Any other opinions? > >I have also thought that perhaps topTable() should label the >p-value/q-value column in the output >to indicate which adjustment method was used to generate the table. > > > and also add a line to the > > documentation (it might already be there and I missed it) > > > > "If the number of significant results at level alpha is less than > > alpha*(number of genes), then the q-value will be 1.0." > > > > It seems like I have to explain this to just about every investigator who > > runs into this. > >I get a lot of questions about this as well. Actually, the statement >you've made isn't always >true, although it usually is. Even if the smallest p-value out of n genes >is only as small as >1/n, the "fdr" adjusted p-value is not always 1. It can be as small as >1/n depending on the other >n-1 p-values. > >Perhaps the way to go would be for topTable() to output the raw p-values >as well as the adjusted >p-values/q-values. I haven't done this so as to keep the table as small >as possible, but it would >prevent users from being presented with just a list of p-values all equal >to 1. What do you >think? > >Gordon > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD COMMENT

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6