Question

mock comparison p-value histogram in DEXSeq

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.3 years ago

Dear All: In the DEXSeq paper, the authors compared DEXSeq with Cuffdiff in terms of controlling Type-I error rates. From the mock comparison results (control vs. control), we can see DEXSeq reported far fewer genes with differential exon usage (DEU), as shown in Table S2 of the DEXSeq paper (2012). However, I think this kind of mock comparison is "under the null", which means if we plot a histogram of the p-values from such comparison, it should be very close to the histogram from a uniform random variable. I am not sure if the authors from DEXSeq have checked that, or consider it inappropriate. I use my dataset to make a control vs. control comparison, and happily find very few genes with DEU (which is good). However, when I plot the raw p-values (not the B-H adjusted p-values), the resultant histogram is not uniform-like. The height of each histogram bin is increasing monotonically, i.e. the frequencies increase as the p-values increase. In other words, there are "so few" small p-values reported for the control vs. control comparison. What can I tell from such a histogram? The reason why I ask this question is that, even though the number of reported genes with DEU is small using DEXSeq for mock comparison, the p-values, in my thinking, should be uniform-like. I can convince myself with the small numbers, but would be more convinced from the histogram. Thank you for your suggestions. Please correct me if I am wrong! -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DEXSeq_1.6.0 Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 [4] GenomicRanges_1.12.4 hwriter_1.3 IRanges_1.18.1 [7] RCurl_1.95-4.1 Rsamtools_1.12.3 statmod_1.4.17 [10] stats4_3.0.1 stringr_0.6.2 tools_3.0.1 [13] XML_3.96-1.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.

DEXSeq DEXSeq • 1.2k views

ADD COMMENT • link updated 12.5 years ago by Alejandro Reyes ★ 1.9k • written 12.5 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2013-07-12

Dear Gu, Thanks for pointing this out! You are right at saying this and we have observed the same when doing mock comparisons with DEXSeq. The skewed distribution of p-values towards 1 is caused by the dispersion values used when testing: since we take the maximum between the fitted value and the per-exon estimates, our test becomes conservative (therefore the skewed distribution). You will see that if one uses only the per-exon estimate, the p-values are not skewed, but one will call some annoying outliers. Admittedly, this maximum rule is not the most elegant solution, but it is a temporary idea to get rid of outliers. Probably in the future we will integrate new approached such as the DESEq2 bayesian shrinkage in order to avoid doing this. Best regards, Alejandro > Dear All: > > In the DEXSeq paper, the authors compared DEXSeq with Cuffdiff in terms of controlling Type-I error rates. From the mock comparison results (control vs. control), we can see DEXSeq reported far fewer genes with differential exon usage (DEU), as shown in Table S2 of the DEXSeq paper (2012). However, I think this kind of mock comparison is "under the null", which means if we plot a histogram of the p-values from such comparison, it should be very close to the histogram from a uniform random variable. I am not sure if the authors from DEXSeq have checked that, or consider it inappropriate. > > I use my dataset to make a control vs. control comparison, and happily find very few genes with DEU (which is good). However, when I plot the raw p-values (not the B-H adjusted p-values), the resultant histogram is not uniform-like. The height of each histogram bin is increasing monotonically, i.e. the frequencies increase as the p-values increase. In other words, there are "so few" small p-values reported for the control vs. control comparison. What can I tell from such a histogram? > > The reason why I ask this question is that, even though the number of reported genes with DEU is small using DEXSeq for mock comparison, the p-values, in my thinking, should be uniform-like. I can convince myself with the small numbers, but would be more convinced from the histogram. > > Thank you for your suggestions. Please correct me if I am wrong! > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US > [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] DEXSeq_1.6.0 Biobase_2.20.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 > [4] GenomicRanges_1.12.4 hwriter_1.3 IRanges_1.18.1 > [7] RCurl_1.95-4.1 Rsamtools_1.12.3 statmod_1.4.17 > [10] stats4_3.0.1 stringr_0.6.2 tools_3.0.1 > [13] XML_3.96-1.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. >