Question: DESeq2: Abnormally skewed distribution of p-values
1
gravatar for GFM
4.3 years ago by
GFM20
European Union
GFM20 wrote:

Dear sirs, We have tried to apply DESeq2 to calculate differential expression between two samples (which we expect to be fairly similar in terms of gene expression). Indeed, we find a handful of differentially expressed genes, but when we look at the overall distribution of p-values we see a strange behavior: The distribution in non-uniform, and in fact we see an monotone decrease in the frequency of lower p-values (like a ramp). What could be the reason for this unusual behavior and do we need to re-run DESeq with different configuration? Below is the R script and the sessionInfo. Thank you for your help

> dds <- DESeqDataSetFromMatrix(countData=countsTable, colData=colData, design=~conds);
> dds <- DESeq(dds, betaPrior = FALSE); 
> res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE, contrast=c("conds","WT_female","WT_male")); 
> sessionInfo() 
R version 3.1.1 (2014-07-10) 
Platform: x86_64-unknown-linux-gnu (64-bit) 

locale: 
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 

attached base packages: 
[1] parallel stats4 stats graphics grDevices utils datasets methods base 

other attached packages: 
[1] matrixStats_0.14.0 gplots_2.16.0 RColorBrewer_1.1-2 DESeq2_1.6.3 GenomicFeatures_1.18.7 AnnotationDbi_1.28.2 Biobase_2.26.0 
[8] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1 S4Vectors_0.4.0 BiocInstaller_1.16.5 RcppArmadillo_0.5.000.0 Rcpp_0.11.5 
[15] BiocGenerics_0.12.1 methylKit_0.9.4 

loaded via a namespace (and not attached): 
[1] acepack_1.3-3.3 annotate_1.44.0 base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 BiocParallel_1.0.3 biomaRt_2.22.0 
[8] Biostrings_2.34.1 bitops_1.0-6 brew_1.0-6 caTools_1.17.1 checkmate_1.5.2 chron_2.3-45 cluster_2.0.1 
[15] codetools_0.2-11 colorspace_1.2-6 data.table_1.9.4 DBI_0.3.1 digest_0.6.8 fail_1.2 foreach_1.4.2 
[22] foreign_0.8-63 Formula_1.2-1 gdata_2.13.3 genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.2 ggplot2_1.0.1 
[29] grid_3.1.1 gtable_0.1.2 gtools_3.4.2 Hmisc_3.15-0 iterators_1.0.7 KernSmooth_2.23-14 lattice_0.20-31 
[36] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-40 munsell_0.4.2 nnet_7.3-9 plyr_1.8.2 proto_0.3-10 
[43] RCurl_1.95-4.6 reshape2_1.4.1 rpart_4.1-9 Rsamtools_1.18.3 RSQLite_1.0.0 rtracklayer_1.26.3 scales_0.2.4 
[50] sendmailR_1.2-1 splines_3.1.1 stringr_0.6.2 survival_2.38-1 tools_3.1.1 XML_3.98-1.1 xtable_1.7-4 
[57] XVector_0.6.0 zlibbioc_1.12.0
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by GFM20
Answer: DESeq2: Abnormally skewed distribution of p-values
2
gravatar for Michael Love
4.3 years ago by
Michael Love26k
United States
Michael Love26k wrote:

hi,

A p-value distribution which slopes down for the low p-value is conservative (with less p-values < alpha than alpha).

When you run DESeq2 on a dataset with two samples, you will see the following warning:

In checkForExperimentalReplicates(object, modelMatrix) :
  same number of samples and coefficients to fit,
  estimating dispersion by treating samples as replicates.
  read the ?DESeq section on 'Experiments without replicates'

If you read that section of the man page which the warning message points you to, you will see that the conservative behavior when running DESeq2 on experiments without replicates is expected:

     Experiments without replicates do not allow for estimation of the
     dispersion of counts around the expected value for each group,
     which is critical for differential expression analysis. If an
     experimental design is supplied which does not contain the
     necessary degrees of freedom for differential analysis, ‘DESeq’
     will provide a message to the user and follow the strategy
     outlined in Anders and Huber (2010) under the section 'Working
     without replicates', wherein all the samples are considered as
     replicates of a single group for the estimation of dispersion. As
     noted in the reference above: "Some overestimation of the variance
     may be expected, which will make that approach conservative."
     Furthermore, "while one may not want to draw strong conclusions
     from such an analysis, it may still be useful for exploration and
     hypothesis generation."

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Michael Love26k
Answer: DESeq2: Abnormally skewed distribution of p-values
0
gravatar for GFM
4.3 years ago by
GFM20
European Union
GFM20 wrote:

Thanks for the answer. We have triplicates (and we did not get the warning).
Do you have any suggestions why we get that distribution of p-values and whether there is  a way to overcome this?
Thanks.

ADD COMMENTlink written 4.3 years ago by GFM20

I can only guess based on your earlier post. When you said " We have tried to apply DESeq2 to calculate differential expression between two samples", I assumed you were comparing only two samples.

One approach to correct a conservative sloping "null part" of a p-value histogram is described in this document by Bernd Klaus, using the fdrtool package:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html#inspection-and-correction-of-pvalues

ADD REPLYlink written 4.3 years ago by Michael Love26k
Answer: DESeq2: Abnormally skewed distribution of p-values
0
gravatar for GFM
4.3 years ago by
GFM20
European Union
GFM20 wrote:

This is very helpful. Thanks a lot!!!

ADD COMMENTlink written 4.3 years ago by GFM20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour