Hi Fluvia,

if you get a hill shaped histogram, it simply means that the p-values are not "well calibrated".

It is a bit more involved to explain this more or less properly, so sorry in advance for this

rather long post.

In a typical RNA-Seq experiment, most genes will not be (strongly) differentially expressed, and the p-values for them should then follow a uniform distribution, while the p-values for the differentially expressed genes will be enriched for low p-values.

Now a simply way to formalize this is to think of the Wald test z-scores form DESeq as following

a two groups model, with a null and an alternative distribution. Now the "theoritical null" should

be a N(0,1) distribution, but sometimes this need not be the case and one gets too p--values that are too high in lower end range leading to a hill--shaped p--value histogram.

See section 5 of

http://www-huber.embl.de/users/klaus/Teaching/Testing-lab.pdf

For a simulated example that should make this clear. See

http://dx.doi.org/10.1515/sagmb-2013-0074

For a current paper discussing this phenomenon.

Now a simple solution to this problem (also followed in the SAGMB paper)

is to include a variance parameter sigma in the null distribution for the z-scores and

model it as N(0,sigma). If you have a hill shaped p-value histogram, the theoretical

sigma of 1 is most probably to high and now fdrtool can be used to estimate the sigma

parameter from the data, leading to an estimate less than 1.

See Efron's review article

http://arxiv.org/abs/0808.0572

and section 6 of his book "Large-Scale Inference

Empirical Bayes Methods for Estimation, Testing, and Prediction" for details on empirical null estimation.

Another important question is now where the problem with the "incorrect" p-values

actually come from.

My guess is that this often stems from the fact that DESeq2, just

like LIMMA, models one variance parameter per gene (the dispersion). This implicitly assumes that the within group variability is not different between the groups. So if this is not correct, since

e.g. the control group is less variable than the treatment group(s), it might lead to

wrong p-values. If the sample size is quite low, also outlying samples might also cause

the phenomenon.

Now one could model different variance parameters, but this becomes much harder

to estimate, so it is hard to do automatically in a software package. For an analysis that uses

multiple dispersion parameters per feature, see:

http://www-huber.embl.de/pub/pdf/Reyes2013.pdf

So it is easier just to re-estimate the null model from the test statistics and then use

this empirical null model to compute the p-values.

Best wishes,

Bernd