Question

DESeq2: Abnormally skewed distribution of p-values

1

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 4.0 years ago

European Union

Dear sirs, We have tried to apply DESeq2 to calculate differential expression between two samples (which we expect to be fairly similar in terms of gene expression). Indeed, we find a handful of differentially expressed genes, but when we look at the overall distribution of p-values we see a strange behavior: The distribution in non-uniform, and in fact we see an monotone decrease in the frequency of lower p-values (like a ramp). What could be the reason for this unusual behavior and do we need to re-run DESeq with different configuration? Below is the R script and the sessionInfo. Thank you for your help

> dds <- DESeqDataSetFromMatrix(countData=countsTable, colData=colData, design=~conds);
> dds <- DESeq(dds, betaPrior = FALSE); 
> res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE, contrast=c("conds","WT_female","WT_male")); 
> sessionInfo() 
R version 3.1.1 (2014-07-10) 
Platform: x86_64-unknown-linux-gnu (64-bit) 

locale: 
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 

attached base packages: 
[1] parallel stats4 stats graphics grDevices utils datasets methods base 

other attached packages: 
[1] matrixStats_0.14.0 gplots_2.16.0 RColorBrewer_1.1-2 DESeq2_1.6.3 GenomicFeatures_1.18.7 AnnotationDbi_1.28.2 Biobase_2.26.0 
[8] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1 S4Vectors_0.4.0 BiocInstaller_1.16.5 RcppArmadillo_0.5.000.0 Rcpp_0.11.5 
[15] BiocGenerics_0.12.1 methylKit_0.9.4 

loaded via a namespace (and not attached): 
[1] acepack_1.3-3.3 annotate_1.44.0 base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 BiocParallel_1.0.3 biomaRt_2.22.0 
[8] Biostrings_2.34.1 bitops_1.0-6 brew_1.0-6 caTools_1.17.1 checkmate_1.5.2 chron_2.3-45 cluster_2.0.1 
[15] codetools_0.2-11 colorspace_1.2-6 data.table_1.9.4 DBI_0.3.1 digest_0.6.8 fail_1.2 foreach_1.4.2 
[22] foreign_0.8-63 Formula_1.2-1 gdata_2.13.3 genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.2 ggplot2_1.0.1 
[29] grid_3.1.1 gtable_0.1.2 gtools_3.4.2 Hmisc_3.15-0 iterators_1.0.7 KernSmooth_2.23-14 lattice_0.20-31 
[36] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-40 munsell_0.4.2 nnet_7.3-9 plyr_1.8.2 proto_0.3-10 
[43] RCurl_1.95-4.6 reshape2_1.4.1 rpart_4.1-9 Rsamtools_1.18.3 RSQLite_1.0.0 rtracklayer_1.26.3 scales_0.2.4 
[50] sendmailR_1.2-1 splines_3.1.1 stringr_0.6.2 survival_2.38-1 tools_3.1.1 XML_3.98-1.1 xtable_1.7-4 
[57] XVector_0.6.0 zlibbioc_1.12.0

deseq2 p-value distribution differential expression • 4.0k views

ADD COMMENT • link 10.4 years ago GFM ▴ 20

score 2 · Answer 1 · 2015-08-10

hi,

A p-value distribution which slopes down for the low p-value is conservative (with less p-values < alpha than alpha).

When you run DESeq2 on a dataset with two samples, you will see the following warning:

In checkForExperimentalReplicates(object, modelMatrix) :
  same number of samples and coefficients to fit,
  estimating dispersion by treating samples as replicates.
  read the ?DESeq section on 'Experiments without replicates'

If you read that section of the man page which the warning message points you to, you will see that the conservative behavior when running DESeq2 on experiments without replicates is expected:

     Experiments without replicates do not allow for estimation of the
     dispersion of counts around the expected value for each group,
     which is critical for differential expression analysis. If an
     experimental design is supplied which does not contain the
     necessary degrees of freedom for differential analysis, ‘DESeq’
     will provide a message to the user and follow the strategy
     outlined in Anders and Huber (2010) under the section 'Working
     without replicates', wherein all the samples are considered as
     replicates of a single group for the estimation of dispersion. As
     noted in the reference above: "Some overestimation of the variance
     may be expected, which will make that approach conservative."
     Furthermore, "while one may not want to draw strong conclusions
     from such an analysis, it may still be useful for exploration and
     hypothesis generation."

score 0 · Answer 2 · 2015-08-11

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 4.0 years ago

European Union

Thanks for the answer. We have triplicates (and we did not get the warning).
Do you have any suggestions why we get that distribution of p-values and whether there is a way to overcome this?
Thanks.

ADD COMMENT • link 10.4 years ago GFM ▴ 20

1

Entering edit mode

I can only guess based on your earlier post. When you said " We have tried to apply DESeq2 to calculate differential expression between two samples", I assumed you were comparing only two samples.

One approach to correct a conservative sloping "null part" of a p-value histogram is described in this document by Bernd Klaus, using the fdrtool package:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html#inspection-and-correction-of-pvalues

ADD REPLY • link 10.4 years ago Michael Love 43k

score 0 · Answer 3 · 2015-08-11

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 4.0 years ago

European Union

This is very helpful. Thanks a lot!!!

ADD COMMENT • link 10.4 years ago GFM ▴ 20