if you get a hill shaped histogram, it simply means that the p-values are not "well calibrated".
It is a bit more involved to explain this more or less properly, so sorry in advance for this
rather long post.
In a typical RNA-Seq experiment, most genes will not be (strongly) differentially expressed, and the p-values for them should then follow a uniform distribution, while the p-values for the differentially expressed genes will be enriched for low p-values.
Now a simply way to formalize this is to think of the Wald test z-scores form DESeq as following
a two groups model, with a null and an alternative distribution. Now the "theoritical null" should
be a N(0,1) distribution, but sometimes this need not be the case and one gets too p--values that are too high in lower end range leading to a hill--shaped p--value histogram.
See section 5 of
For a simulated example that should make this clear. See
For a current paper discussing this phenomenon.
Now a simple solution to this problem (also followed in the SAGMB paper)
is to include a variance parameter sigma in the null distribution for the z-scores and
model it as N(0,sigma). If you have a hill shaped p-value histogram, the theoretical
sigma of 1 is most probably to high and now fdrtool can be used to estimate the sigma
parameter from the data, leading to an estimate less than 1.
See Efron's review article
and section 6 of his book "Large-Scale Inference
Empirical Bayes Methods for Estimation, Testing, and Prediction" for details on empirical null estimation.
Another important question is now where the problem with the "incorrect" p-values
actually come from.
My guess is that this often stems from the fact that DESeq2, just
like LIMMA, models one variance parameter per gene (the dispersion). This implicitly assumes that the within group variability is not different between the groups. So if this is not correct, since
e.g. the control group is less variable than the treatment group(s), it might lead to
wrong p-values. If the sample size is quite low, also outlying samples might also cause
Now one could model different variance parameters, but this becomes much harder
to estimate, so it is hard to do automatically in a software package. For an analysis that uses
multiple dispersion parameters per feature, see:
So it is easier just to re-estimate the null model from the test statistics and then use
this empirical null model to compute the p-values.