Dear sirs, We have tried to apply DESeq2 to calculate differential expression between two samples (which we expect to be fairly similar in terms of gene expression). Indeed, we find a handful of differentially expressed genes, but when we look at the overall distribution of p-values we see a strange behavior: The distribution in non-uniform, and in fact we see an monotone decrease in the frequency of lower p-values (like a ramp). What could be the reason for this unusual behavior and do we need to re-run DESeq with different configuration? Below is the R script and the sessionInfo. Thank you for your help
> dds <- DESeqDataSetFromMatrix(countData=countsTable, colData=colData, design=~conds); > dds <- DESeq(dds, betaPrior = FALSE); > res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE, contrast=c("conds","WT_female","WT_male")); > sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] matrixStats_0.14.0 gplots_2.16.0 RColorBrewer_1.1-2 DESeq2_1.6.3 GenomicFeatures_1.18.7 AnnotationDbi_1.28.2 Biobase_2.26.0 [8] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1 S4Vectors_0.4.0 BiocInstaller_1.16.5 RcppArmadillo_0.5.000.0 Rcpp_0.11.5 [15] BiocGenerics_0.12.1 methylKit_0.9.4 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 BiocParallel_1.0.3 biomaRt_2.22.0 [8] Biostrings_2.34.1 bitops_1.0-6 brew_1.0-6 caTools_1.17.1 checkmate_1.5.2 chron_2.3-45 cluster_2.0.1 [15] codetools_0.2-11 colorspace_1.2-6 data.table_1.9.4 DBI_0.3.1 digest_0.6.8 fail_1.2 foreach_1.4.2 [22] foreign_0.8-63 Formula_1.2-1 gdata_2.13.3 genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.2 ggplot2_1.0.1 [29] grid_3.1.1 gtable_0.1.2 gtools_3.4.2 Hmisc_3.15-0 iterators_1.0.7 KernSmooth_2.23-14 lattice_0.20-31 [36] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-40 munsell_0.4.2 nnet_7.3-9 plyr_1.8.2 proto_0.3-10 [43] RCurl_1.95-4.6 reshape2_1.4.1 rpart_4.1-9 Rsamtools_1.18.3 RSQLite_1.0.0 rtracklayer_1.26.3 scales_0.2.4 [50] sendmailR_1.2-1 splines_3.1.1 stringr_0.6.2 survival_2.38-1 tools_3.1.1 XML_3.98-1.1 xtable_1.7-4 [57] XVector_0.6.0 zlibbioc_1.12.0
I can only guess based on your earlier post. When you said " We have tried to apply DESeq2 to calculate differential expression between two samples", I assumed you were comparing only two samples.
One approach to correct a conservative sloping "null part" of a p-value histogram is described in this document by Bernd Klaus, using the fdrtool package: