After reading some post that recommend checking p-value distributions after DE testing, I went back and looked at some data where the result of DE testing was 0 significantly DE genes (3 replicates, 3 conditions, 3 times). Note that this was using an experiment where all samples were loaded into DESeq2 so that additional samples could be used for dispersion estimation. It turns out those p-value distributions were all hill shaped like so

This is not expected in a DE test the resulted in 0 genes, as the pvalues should be evenly distributed across all p-values in this type of result. Instead this seems to reflect an overestimated dispersion where many genes were assigned high pvalues because of their high dispersion values when their dispersion value should truly be lower or not gene-wise (group-wise).

Thus, I attempted to correct this using the recommended fdrtool and got a better looking p-value distribution (I think). When looking at DE, I strikingly also now had the result of over 1000 significant DE genes for the same comparison that just had 0.

To further see how dispersion might be affecting this, I decided to do the same test, but only loading into DESeq2 the samples which I wanted to compare (not the whole experiment). I had done this in the past when I was not correcting for the p-value distribution as above and got similar results to loading all samples into DESeq2.

The p-value distribution and correction looked similar to the histograms above, however, the new DE result contained less than 200 genes as compared to 1000 genes when loading all samples! When looking at a PCA plot, it would not appear that these samples need to be run separately.

My So my question is: Am I right by correcting p-values by examining p-value distributions? Why am I getting these huge differences when I do the correction, but am loading my samples into DESeq2 differently?

Hi Bernd,

Thank you for the detailed reply, I have gone ahead and followed your recommendations.

2) Between time points, the estimated dispersion prior variance only differs by 0.02. However, when only loading samples relevant to a pairwise comparison of interest, there are instances where the estimated dispersion prior differs as much as 0.21. Since the biggest differences in dispersion prior variance appear to be variable when grouped by comparisons, this leads me to think that there is indeed very different dispersion within groups.

4) When running LRTs, loading only samples of interest, I get similar results to when I use the original Wald test without pvalue distribution adjustment (many comparisons have 0 DE, pvalue distribution are right skewed). However, when these pvalue distributions are adjusted for, they result in U-shaped distributions, and 4-5 thousand differentially expressed genes (Wald test with pvalue distr adjustment results in at most 1100 genes).

Why is there such behavior when using different test and adjusting the pvalue distribution? I am not interested in using the LRT to compare multiple levels at once or see which genes change over time, so I am only limited to use it when loading my samples of interest to compare and thus compare the full to reduced (intercept) model.

Because of the difference in estimated dispersion prior when loading only samples of interest, I believe it would be best to stick with the Wald test and only loading samples of interest.

Does the first histogram use prefiltering? e.g. genes with 3 or more samples with a normalized count >= 10. The spikes may be just from low counts. We had this in a DESeq2 supplementary figure.

You can also try to compare the dispersion estimates for the different time points directly as the prior comparison assumes that the trend fit is similar (I edited my answer accordingly). They can be accessed via

`rowData(dds)$dispersion. And you could e.g. look at a scatterplot of log10(rowData(dds)$dispersion) in time 1 vs time 2.`

fdrtool is estimating the null model variance from the data, so it can also underestimate it, which commonly leads to the bathtub shaped histogram. It might be confused by the spikes, so try pre-filtering as suggested by Mike first.