DeSeq2 - hill–shaped histogram of pvalues
Entering edit mode
deut13 ▴ 80
Last seen 22 months ago


a very interesting tutorial on using DeSeq2 ( suggests using fdrtool to re-estimate pvalues in case the histogram has a hill-shape, indicating an overestimation of the variance of the null distribution.

My question is: when by following this procedure the estimated null model variance turns out to be lower than 1, does it mean that overestimation of the variance is indeed the most likely cause of the hill shape and thus the best way to extract the differentially expressed genes is to use the re-estimated pvalues and corresponding adjusted pvalues by fdrtool?



deseq2 pvalue histogram • 3.8k views
Entering edit mode
Bernd Klaus ▴ 590
Last seen 2.7 years ago

Hi Fluvia,

if you get a hill shaped histogram, it simply means that the p-values are not "well calibrated".
It is a bit more involved to explain this more or less properly, so sorry in advance for this
rather long post.

In a typical  RNA-Seq experiment, most genes will not be (strongly) differentially expressed, and the p-values for them should then follow a uniform distribution, while the p-values for the differentially expressed genes will be enriched for low p-values.

Now a simply way to formalize this is to think of the Wald test z-scores form DESeq as following
a two groups model, with a null and an alternative distribution. Now the "theoritical null" should
be a N(0,1) distribution, but sometimes this need not be the case and one gets too  p--values that are too high in lower end range leading to a hill--shaped p--value histogram.

See section 5 of

For a simulated example that should make this clear. See

For a current paper discussing this phenomenon.

Now a simple solution to this problem (also followed in the SAGMB paper)
is to include a variance parameter sigma in the null distribution for the z-scores and
model it as N(0,sigma). If you have a hill shaped p-value histogram, the theoretical
sigma of 1 is most probably to high and now fdrtool can be used to estimate the sigma
parameter from the data, leading to an estimate less than 1.

See Efron's review article

and section 6 of his book "Large-Scale Inference
Empirical Bayes Methods for Estimation, Testing, and Prediction" for details on empirical null estimation.

Another important question is now where the problem with the "incorrect" p-values
actually come from.

My  guess is that this often stems  from the fact that DESeq2, just
like LIMMA, models one variance parameter per gene (the dispersion). This implicitly assumes that the within group variability is not different between the groups. So if this is not correct, since
e.g. the control group is less variable than the treatment group(s), it might lead to
wrong p-values. If the sample size is quite low, also outlying samples might also cause
the phenomenon.

Now one could model different variance parameters, but this becomes much harder
to estimate, so it is hard to do automatically in a software package. For an analysis that uses
multiple dispersion parameters per feature, see:

So it is easier just to re-estimate the null model from the test statistics and then use
this empirical null model to compute the p-values.

Best wishes,



Entering edit mode

Thanks Bernd! Just to mention: another possible cause of non-flat (center and right side) of p-value histograms is the discreteness of very low counts which are sometimes the majority of the rows in a dataset. Removing the low count rows (e.g. row mean < 5) sometimes recovers a flat shape (in the center and right side), which tells us that the p-values were in fact properly calibrated. Another reason is if you use lfcThreshold with a positive threshold, then we do not expect uniform p-values anymore due to the conservativeness of the test construction (see DESeq2 paper).

Entering edit mode

Thank you very much Bernd and Michael for the detailed explanation and references!

I have indeed seen that the removal of (very) low count rows might in some cases recover a flat shape. In the cases where the hill shape remains evident, I was wondering whether this pvalue procedure is the best way to proceed or this might also lead to "too many" rejections of the null and thus it might in some cases be safer to be more conservative. I have so far tried the correction procedure for only a couple of cases and found that the number of rejections might change dramatically (but I have seen this in a case with very small sample size). I will carefully read your suggested references!

As a side note, I have read in other posts (for ex. here Right skewed histogram of p-values ) that a "right skew" of the histogram can be a sign of a covariate that is not controlled. However, I assumed this refers to cases where the histogram shape has a sort of monotonic increase of frequencies for increasing pvalues, as in the simulated data by Simon in this post.


Entering edit mode

Hi Fulvia,

yes, a "monotone" histogram might in fact be due to unexplained covariates. In original publication of the method implemented in RUV Seq (

"Using control genes to correct for unwanted variation in microarray data"

The authors argue that one should remove as many hidden variables / factors as necessary to obtain a "proper" p-value histogram. See their section 2.2. and their figure 2 for an example that is very similar to Simon's on real data with severe batch effects.

Hope that helps,




Entering edit mode

Thanks a lot Bernd for the further suggestions!


Login before adding your answer.

Traffic: 361 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6