Q-values way smaller than P-values! Is that kosher?
2
0
Entering edit mode
Joshua Banta ▴ 30
@joshua-banta-6311
Last seen 4 months ago
United States
Dear listserv, Consider the following vector of P-values: p.values <- c( 0.266845129, 0.353675275, 0.030585143, 0.791285527, 0.60805832, 0.433466716, 0.039369439, 0.48918122, 0.740626586 ) When I run the qvalue function of the qvalue package, I get a rather surprising outcome: the q-values are much smaller than the corresponding p-values! See the following code below: source("http://bioconductor.org/biocLite.R") biocLite("qvalue") library(qvalue) q.values <- qvalue(p.values)$qvalue comparison <- cbind(p.values, q.values) colnames(comparison) <- c("p.values", "q.values") comparison p.values q.values [1,] 0.26684513 0.037111560 [2,] 0.35367528 0.037111560 [3,] 0.03058514 0.008960246 [4,] 0.79128553 0.040020397 [5,] 0.60805832 0.039540110 [6,] 0.43346672 0.037111560 [7,] 0.03936944 0.008960246 [8,] 0.48918122 0.037111560 [9,] 0.74062659 0.040020397 So what is happening here? Can I trust the q-values? In each case they are substantially larger than the P-values. Especially troubling to me are rows 4, 5, and 9. Thanks in advance for any information/advice! ----------------------------------- Josh Banta, Ph.D Assistant Professor Department of Biology The University of Texas at Tyler Tyler, TX 75799 Tel: (903) 565-5655 http://plantevolutionaryecology.org [[alternative HTML version deleted]] qvalue qvalue • 3.2k views ADD COMMENT 0 Entering edit mode Guest User ★ 12k @guest-user-4897 Last seen 8.4 years ago Dear listserv, Consider the following vector of P-values: p.values <- c( 0.266845129, 0.353675275, 0.030585143, 0.791285527, 0.60805832, 0.433466716, 0.039369439, 0.48918122, 0.740626586 ) When I run the qvalue function of the qvalue package, I get a rather surprising outcome: the q-values are much smaller than the corresponding p-values! See the following code below: source("http://bioconductor.org/biocLite.R") biocLite("qvalue") library(qvalue) q.values <- qvalue(p.values)$qvalue comparison <- cbind(p.values, q.values) colnames(comparison) <- c("p.values", "q.values") comparison p.values q.values [1,] 0.26684513 0.037111560 [2,] 0.35367528 0.037111560 [3,] 0.03058514 0.008960246 [4,] 0.79128553 0.040020397 [5,] 0.60805832 0.039540110 [6,] 0.43346672 0.037111560 [7,] 0.03936944 0.008960246 [8,] 0.48918122 0.037111560 [9,] 0.74062659 0.040020397 So what is happening here? Can I trust the q-values? In each case they are substantially larger than the P-values. Especially troubling to me are rows 4, 5, and 9. Thanks in advance for any information/advice! ----------------------------------- Josh Banta, Ph.D Assistant Professor Department of Biology The University of Texas at Tyler Tyler, TX 75799 Tel: (903) 565-5655 http://plantevolutionaryecology.org -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.16.8 BiocInstaller_1.10.4 qvalue_1.34.0 [4] gsubfn_0.6-5 proto_0.3-10 emma_1.1.2 loaded via a namespace (and not attached): [1] tcltk_3.0.1 tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States
hi Joshua, While you wait for a definitive answer from the package maintainer, note that the methods in the package are designed for datasets with number of tests, m, in the thousands. for example, the reference mentions: "Because we are considering many features (i.e., *m* is very large), it can be shown that..." http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/ Mike On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta@uttyler.edu> wrote: > Dear listserv, > > Consider the following vector of P-values: > > p.values <- c( > 0.266845129, > 0.353675275, > 0.030585143, > 0.791285527, > 0.60805832, > 0.433466716, > 0.039369439, > 0.48918122, > 0.740626586 > ) > > When I run the qvalue function of the qvalue package, I get a rather > surprising outcome: the q-values are much smaller than the corresponding > p-values! > > See the following code below: > > source("http://bioconductor.org/biocLite.R") > biocLite("qvalue") > library(qvalue) > > q.values <- qvalue(p.values)$qvalue > > comparison <- cbind(p.values, q.values) > > colnames(comparison) <- c("p.values", "q.values") > > comparison > > p.values q.values > [1,] 0.26684513 0.037111560 > [2,] 0.35367528 0.037111560 > [3,] 0.03058514 0.008960246 > [4,] 0.79128553 0.040020397 > [5,] 0.60805832 0.039540110 > [6,] 0.43346672 0.037111560 > [7,] 0.03936944 0.008960246 > [8,] 0.48918122 0.037111560 > [9,] 0.74062659 0.040020397 > > So what is happening here? Can I trust the q-values? In each case they are > substantially larger than the P-values. Especially troubling to me are rows > 4, 5, and 9. > > Thanks in advance for any information/advice! > ----------------------------------- > Josh Banta, Ph.D > Assistant Professor > Department of Biology > The University of Texas at Tyler > Tyler, TX 75799 > Tel: (903) 565-5655 > http://plantevolutionaryecology.org > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] ADD COMMENT 0 Entering edit mode Hi Josh, Of course it's not statistically defensible, and you can't use these qvalues. As Mike Love points out, qvalue just isn't intended for small sets of p-values like this. The main problem is that qvalue's estimate of pi0 (the proportion of true nulls) is not reliable for small numbers of tests. For your data, qvalue estimates pi0 to be  > qvalue(p.values)$pi0
[1] 0.05057643


which is equivalent to saying that most likely all the null hypotheses should be rejected.

Some other methods of estimating pi0 are more robust. For example the limma function propTrueNull gives pi0=0.75 for the same data:

   > library(limma)
> propTrueNull(p.values)
[1] 0.7506187


The qvalue method of estimating pi0 is sensitive to the size of the largest p.values. If you changed just the 0.79 p-value to 0.99, all the qvalues would look very different:

   > data.frame(p.values,q.values=qvalue(p.values)$qvalues) p.values q.values 1 0.26684513 0.5509513 2 0.35367528 0.5509513 3 0.03058514 0.1330221 4 0.99900000 0.7500975 5 0.60805832 0.5870052 6 0.43346672 0.5509513 7 0.03936944 0.1330221 8 0.48918122 0.5509513 9 0.74062659 0.6256105  For very small sets of p-values, it seems to me to be safer to use the Benjamini and Hochberg algorithm (available in the p.adjust function). BH is more conservative than qvalue, but it doesn't rely on an estimator for pi0 and is trustworthy for any number of tests. ADD REPLY 0 Entering edit mode Thank you, Dr. Smyth. I am glad to know this. Very helpful. At what number of P-values would you suggest it is "safe" to use Storey's Q-value method? For instance, what if I had 300 P-values? What if I had 100? What if I had 1000? On Jan 12, 2014, at 2:34 AM, Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Hi Josh, Of course it's not statistically defensible, and you can't use these qvalues. As Mike Love points out, qvalue just isn't intended for small sets of p-values like this. The main problem is that qvalue's estimate of pi0 (the proportion of true nulls) is not reliable for small numbers of tests. For your data, qvalue estimates pi0 to be > qvalue(p.values)$pi0 [1] 0.05057643 which is equivalent to saying that most likely all the null hypotheses should be rejected. Some other methods of estimating pi0 are more robust. For example the limma function propTrueNull gives pi0=0.75 for the same data: > library(limma) > propTrueNull(p.values) [1] 0.7506187 The qvalue method of estimating pi0 is sensitive to the size of the largest p.values. If you changed just the 0.79 p-value to 0.99, all the qvalues would look very different: > data.frame(p.values,q.values=qvalue(p.values)$qvalues) p.values q.values 1 0.26684513 0.5509513 2 0.35367528 0.5509513 3 0.03058514 0.1330221 4 0.99900000 0.7500975 5 0.60805832 0.5870052 6 0.43346672 0.5509513 7 0.03936944 0.1330221 8 0.48918122 0.5509513 9 0.74062659 0.6256105 For very small sets of p-values, it seems to me to be safer to use the Benjamini and Hochberg algorithm (available in the p.adjust function). BH is more conservative than qvalue, but it doesn't rely on an estimator for pi0 and is trustworthy for any number of tests. Best wishes Gordon On Sat, 11 Jan 2014, Michael Love wrote: hi Joshua, While you wait for a definitive answer from the package maintainer, note that the methods in the package are designed for datasets with number of tests, m, in the thousands. for example, the reference mentions: "Because we are considering many features (i.e., *m* is very large), it can be shown that..." http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/ Mike On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta@uttyler.edu> wrote: Dear listserv, Consider the following vector of P-values: p.values <- c( 0.266845129, 0.353675275, 0.030585143, 0.791285527, 0.60805832, 0.433466716, 0.039369439, 0.48918122, 0.740626586 ) When I run the qvalue function of the qvalue package, I get a rather surprising outcome: the q-values are much smaller than the corresponding p-values! See the following code below: source("http://bioconductor.org/biocLite.R") biocLite("qvalue") library(qvalue) q.values <- qvalue(p.values)$qvalue comparison <- cbind(p.values, q.values) colnames(comparison) <- c("p.values", "q.values") comparison p.values q.values [1,] 0.26684513 0.037111560 [2,] 0.35367528 0.037111560 [3,] 0.03058514 0.008960246 [4,] 0.79128553 0.040020397 [5,] 0.60805832 0.039540110 [6,] 0.43346672 0.037111560 [7,] 0.03936944 0.008960246 [8,] 0.48918122 0.037111560 [9,] 0.74062659 0.040020397 So what is happening here? Can I trust the q-values? In each case they are substantially larger than the P-values. Especially troubling to me are rows 4, 5, and 9. Thanks in advance for any information/advice! ----------------------------------- Josh Banta, Ph.D Assistant Professor Department of Biology The University of Texas at Tyler Tyler, TX 75799 Tel: (903) 565-5655 http://plantevolutionaryecology.org ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:20}}
0
Entering edit mode
Hi Joshua, I haven't seen Dr. Storey respond yet, but you can refer to a paper he wrote with Jon Taylor and David Siegmund, where they use minimal numbers (1000 tests and 3000 tests) of tests to run simulations of power and [p]FDR control: http://genomics.princeton.edu/storeylab/papers/623.pdf It's worth noting that q = p_BH * pi0, so if you botch the estimate of pi0, everything else is suspect. I wouldn't trust q-values for less than 1000 tests, especially if the estimate of pi0 is small. As Dr. Smyth stated, one might just as well conservatively leave pi0 estimated at 1.0 and use B&H fdr estimates (because with pi0=1, p_BH is equivalent to q) when n_t is small, e.g. < 1000. On the flip side, however, if you are comparing sequential tests (e.g. in a replication situation), where the numbers are often much larger than 1000, q-values allow you to avoid penalizing more powerful methods by correcting for the potentially different proportions of rejected tests at a given alpha. This is cool, and it's one of the best reasons to use q-values when they make sense. Storey and Siegmund also wrote a paper recasting q-value estimation as a Bayesian method for pFDR control, where your prior probabilities for pi0 and pi1 are estimated from a Beta-Uniform mixture model (often seen as "BUM" in various later implementations that seek to extend the q-value paradigm). This is not so different from what eBayes does, except that the estimates are made upon the typically ill-posed problem of estimating parameters for a mixture of distributions whose compositions are also unknown. Thus, and in contrast to Dr. Smyth's eBayes procedures, you really do need a large population of tests (members of each mixture component) to benefit from q-values' greater power, as that power comes at the cost of lost FDR control if pi0 is estimated poorly. Hopefully Dr. Storey and/or Dr. Smyth will chime in if I have stated anything incorrect here, although I did refer to the primary sources for good measure ;-) Best, --t --t On Sun, Jan 12, 2014 at 8:50 AM, Joshua Banta <jbanta@uttyler.edu> wrote: > Thank you, Dr. Smyth. I am glad to know this. Very helpful. > > At what number of P-values would you suggest it is "safe" to use Storey's > Q-value method? For instance, what if I had 300 P-values? What if I had > 100? What if I had 1000? > > On Jan 12, 2014, at 2:34 AM, Gordon K Smyth <smyth@wehi.edu.au<mailto:> smyth@wehi.EDU.AU>> wrote: > > Hi Josh, > > Of course it's not statistically defensible, and you can't use these > qvalues. As Mike Love points out, qvalue just isn't intended for small > sets of p-values like this. > > The main problem is that qvalue's estimate of pi0 (the proportion of true > nulls) is not reliable for small numbers of tests. For your data, qvalue > estimates pi0 to be > > > qvalue(p.values)$pi0 > [1] 0.05057643 > > which is equivalent to saying that most likely all the null hypotheses > should be rejected. > > Some other methods of estimating pi0 are more robust. For example the > limma function propTrueNull gives pi0=0.75 for the same data: > > > library(limma) > > propTrueNull(p.values) > [1] 0.7506187 > > The qvalue method of estimating pi0 is sensitive to the size of the > largest p.values. If you changed just the 0.79 p-value to 0.99, all the > qvalues would look very different: > > > data.frame(p.values,q.values=qvalue(p.values)$qvalues) > p.values q.values > 1 0.26684513 0.5509513 > 2 0.35367528 0.5509513 > 3 0.03058514 0.1330221 > 4 0.99900000 0.7500975 > 5 0.60805832 0.5870052 > 6 0.43346672 0.5509513 > 7 0.03936944 0.1330221 > 8 0.48918122 0.5509513 > 9 0.74062659 0.6256105 > > For very small sets of p-values, it seems to me to be safer to use the > Benjamini and Hochberg algorithm (available in the p.adjust function). BH > is more conservative than qvalue, but it doesn't rely on an estimator for > pi0 and is trustworthy for any number of tests. > > Best wishes > Gordon > > > On Sat, 11 Jan 2014, Michael Love wrote: > > hi Joshua, > > While you wait for a definitive answer from the package maintainer, note > that the methods in the package are designed for datasets with number of > tests, m, in the thousands. > > for example, the reference mentions: > > "Because we are considering many features (i.e., *m* is very large), it can > be shown that..." > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/ > > > Mike > > > > > On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta@uttyler.edu> wrote: > > Dear listserv, > > Consider the following vector of P-values: > > p.values <- c( > 0.266845129, > 0.353675275, > 0.030585143, > 0.791285527, > 0.60805832, > 0.433466716, > 0.039369439, > 0.48918122, > 0.740626586 > ) > > When I run the qvalue function of the qvalue package, I get a rather > surprising outcome: the q-values are much smaller than the corresponding > p-values! > > See the following code below: > > source("http://bioconductor.org/biocLite.R") > biocLite("qvalue") > library(qvalue) > > q.values <- qvalue(p.values)$qvalue > > comparison <- cbind(p.values, q.values) > > colnames(comparison) <- c("p.values", "q.values") > > comparison > > p.values q.values > [1,] 0.26684513 0.037111560 > [2,] 0.35367528 0.037111560 > [3,] 0.03058514 0.008960246 > [4,] 0.79128553 0.040020397 > [5,] 0.60805832 0.039540110 > [6,] 0.43346672 0.037111560 > [7,] 0.03936944 0.008960246 > [8,] 0.48918122 0.037111560 > [9,] 0.74062659 0.040020397 > > So what is happening here? Can I trust the q-values? In each case they are > substantially larger than the P-values. Especially troubling to me are rows > 4, 5, and 9. > > Thanks in advance for any information/advice! > ----------------------------------- > Josh Banta, Ph.D > Assistant Professor > Department of Biology > The University of Texas at Tyler > Tyler, TX 75799 > Tel: (903) 565-5655 > http://plantevolutionaryecology.org > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:13}} ADD REPLY 0 Entering edit mode Hi Joshua (and others), One reason why Dr. Storey might not be responding is that his old UW email address is dead. I noticed that he has used his Princeton address for the more recent 'sva' and 'trigger' packages, but inasmuch as q-values are widely used and often misunderstood, perhaps the same would be helpful for 'qvalue'. On the other hand, perhaps there would be an enormous and time-consuming raft of emails just as well handled on mailing lists; I'm in no position to say. Anyways, I'm forwarding this along to Dr. Storey so that he doesn't passively become associated with anything incorrect I might have said in my reply. Best, --t --t On Sun, Jan 12, 2014 at 1:46 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > Hi Joshua, > > I haven't seen Dr. Storey respond yet, but you can refer to a paper he > wrote with Jon Taylor and David Siegmund, where they use minimal numbers > (1000 tests and 3000 tests) of tests to run simulations of power and [p]FDR > control: > > http://genomics.princeton.edu/storeylab/papers/623.pdf > > It's worth noting that q = p_BH * pi0, so if you botch the estimate of > pi0, everything else is suspect. I wouldn't trust q-values for less than > 1000 tests, especially if the estimate of pi0 is small. As Dr. Smyth > stated, one might just as well conservatively leave pi0 estimated at 1.0 > and use B&H fdr estimates (because with pi0=1, p_BH is equivalent to q) > when n_t is small, e.g. < 1000. > > On the flip side, however, if you are comparing sequential tests (e.g. in > a replication situation), where the numbers are often much larger than > 1000, q-values allow you to avoid penalizing more powerful methods by > correcting for the potentially different proportions of rejected tests at a > given alpha. This is cool, and it's one of the best reasons to use > q-values when they make sense. > > Storey and Siegmund also wrote a paper recasting q-value estimation as a > Bayesian method for pFDR control, where your prior probabilities for pi0 > and pi1 are estimated from a Beta-Uniform mixture model (often seen as > "BUM" in various later implementations that seek to extend the q-value > paradigm). This is not so different from what eBayes does, except that the > estimates are made upon the typically ill-posed problem of estimating > parameters for a mixture of distributions whose compositions are also > unknown. Thus, and in contrast to Dr. Smyth's eBayes procedures, you > really do need a large population of tests (members of each mixture > component) to benefit from q-values' greater power, as that power comes at > the cost of lost FDR control if pi0 is estimated poorly. > > Hopefully Dr. Storey and/or Dr. Smyth will chime in if I have stated > anything incorrect here, although I did refer to the primary sources for > good measure ;-) > > Best, > > --t > > > --t > > > On Sun, Jan 12, 2014 at 8:50 AM, Joshua Banta <jbanta@uttyler.edu> wrote: > >> Thank you, Dr. Smyth. I am glad to know this. Very helpful. >> >> At what number of P-values would you suggest it is "safe" to use Storey's >> Q-value method? For instance, what if I had 300 P-values? What if I had >> 100? What if I had 1000? >> >> On Jan 12, 2014, at 2:34 AM, Gordon K Smyth <smyth@wehi.edu.au<mailto:>> smyth@wehi.EDU.AU>> wrote: >> >> Hi Josh, >> >> Of course it's not statistically defensible, and you can't use these >> qvalues. As Mike Love points out, qvalue just isn't intended for small >> sets of p-values like this. >> >> The main problem is that qvalue's estimate of pi0 (the proportion of true >> nulls) is not reliable for small numbers of tests. For your data, qvalue >> estimates pi0 to be >> >> > qvalue(p.values)$pi0 >> [1] 0.05057643 >> >> which is equivalent to saying that most likely all the null hypotheses >> should be rejected. >> >> Some other methods of estimating pi0 are more robust. For example the >> limma function propTrueNull gives pi0=0.75 for the same data: >> >> > library(limma) >> > propTrueNull(p.values) >> [1] 0.7506187 >> >> The qvalue method of estimating pi0 is sensitive to the size of the >> largest p.values. If you changed just the 0.79 p-value to 0.99, all the >> qvalues would look very different: >> >> > data.frame(p.values,q.values=qvalue(p.values)$qvalues) >> p.values q.values >> 1 0.26684513 0.5509513 >> 2 0.35367528 0.5509513 >> 3 0.03058514 0.1330221 >> 4 0.99900000 0.7500975 >> 5 0.60805832 0.5870052 >> 6 0.43346672 0.5509513 >> 7 0.03936944 0.1330221 >> 8 0.48918122 0.5509513 >> 9 0.74062659 0.6256105 >> >> For very small sets of p-values, it seems to me to be safer to use the >> Benjamini and Hochberg algorithm (available in the p.adjust function). BH >> is more conservative than qvalue, but it doesn't rely on an estimator for >> pi0 and is trustworthy for any number of tests. >> >> Best wishes >> Gordon >> >> >> On Sat, 11 Jan 2014, Michael Love wrote: >> >> hi Joshua, >> >> While you wait for a definitive answer from the package maintainer, note >> that the methods in the package are designed for datasets with number of >> tests, m, in the thousands. >> >> for example, the reference mentions: >> >> "Because we are considering many features (i.e., *m* is very large), it >> can >> be shown that..." >> >> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC170937/ >> >> >> Mike >> >> >> >> >> On Fri, Jan 10, 2014 at 1:08 PM, Joshua Banta <jbanta@uttyler.edu> wrote: >> >> Dear listserv, >> >> Consider the following vector of P-values: >> >> p.values <- c( >> 0.266845129, >> 0.353675275, >> 0.030585143, >> 0.791285527, >> 0.60805832, >> 0.433466716, >> 0.039369439, >> 0.48918122, >> 0.740626586 >> ) >> >> When I run the qvalue function of the qvalue package, I get a rather >> surprising outcome: the q-values are much smaller than the corresponding >> p-values! >> >> See the following code below: >> >> source("http://bioconductor.org/biocLite.R") >> biocLite("qvalue") >> library(qvalue) >> >> q.values <- qvalue(p.values)$qvalue >> >> comparison <- cbind(p.values, q.values) >> >> colnames(comparison) <- c("p.values", "q.values") >> >> comparison >> >> p.values q.values >> [1,] 0.26684513 0.037111560 >> [2,] 0.35367528 0.037111560 >> [3,] 0.03058514 0.008960246 >> [4,] 0.79128553 0.040020397 >> [5,] 0.60805832 0.039540110 >> [6,] 0.43346672 0.037111560 >> [7,] 0.03936944 0.008960246 >> [8,] 0.48918122 0.037111560 >> [9,] 0.74062659 0.040020397 >> >> So what is happening here? Can I trust the q-values? In each case they are >> substantially larger than the P-values. Especially troubling to me are >> rows >> 4, 5, and 9. >> >> Thanks in advance for any information/advice! >> ----------------------------------- >> Josh Banta, Ph.D >> Assistant Professor >> Department of Biology >> The University of Texas at Tyler >> Tyler, TX 75799 >> Tel: (903) 565-5655 >> http://plantevolutionaryecology.org >> >> ______________________________________________________________________ >> The information in this email is confidential and intend...{{dropped:20}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
0
Entering edit mode

I generally use BH myself for any number of tests (my packages limma and edgeR use BH), because I don't feel that estimation of pi0 can be made bullet proof even for large numbers of tests.

Best wishes