Hi Fluvia,

if you get a hill shaped histogram, it simply means that the p-values are not "well calibrated".

It is a bit more involved to explain this more or less properly, so sorry in advance for this

rather long post.

In a typical RNA-Seq experiment, most genes will not be (strongly) differentially expressed, and the p-values for them should then follow a uniform distribution, while the p-values for the differentially expressed genes will be enriched for low p-values.

Now a simply way to formalize this is to think of the Wald test z-scores form DESeq as following

a two groups model, with a null and an alternative distribution. Now the "theoritical null" should

be a N(0,1) distribution, but sometimes this need not be the case and one gets too p--values that are too high in lower end range leading to a hill--shaped p--value histogram.

See section 5 of

http://www-huber.embl.de/users/klaus/Teaching/Testing-lab.pdf

For a simulated example that should make this clear. See

http://dx.doi.org/10.1515/sagmb-2013-0074

For a current paper discussing this phenomenon.

Now a simple solution to this problem (also followed in the SAGMB paper)

is to include a variance parameter sigma in the null distribution for the z-scores and

model it as N(0,sigma). If you have a hill shaped p-value histogram, the theoretical

sigma of 1 is most probably to high and now fdrtool can be used to estimate the sigma

parameter from the data, leading to an estimate less than 1.

See Efron's review article

http://arxiv.org/abs/0808.0572

and section 6 of his book "Large-Scale Inference

Empirical Bayes Methods for Estimation, Testing, and Prediction" for details on empirical null estimation.

Another important question is now where the problem with the "incorrect" p-values

actually come from.

My guess is that this often stems from the fact that DESeq2, just

like LIMMA, models one variance parameter per gene (the dispersion). This implicitly assumes that the within group variability is not different between the groups. So if this is not correct, since

e.g. the control group is less variable than the treatment group(s), it might lead to

wrong p-values. If the sample size is quite low, also outlying samples might also cause

the phenomenon.

Now one could model different variance parameters, but this becomes much harder

to estimate, so it is hard to do automatically in a software package. For an analysis that uses

multiple dispersion parameters per feature, see:

http://www-huber.embl.de/pub/pdf/Reyes2013.pdf

So it is easier just to re-estimate the null model from the test statistics and then use

this empirical null model to compute the p-values.

Best wishes,

Bernd

Thanks Bernd! Just to mention: another possible cause of non-flat (center and right side) of p-value histograms is the discreteness of very low counts which are sometimes the majority of the rows in a dataset. Removing the low count rows (e.g. row mean < 5) sometimes recovers a flat shape (in the center and right side), which tells us that the p-values were in fact properly calibrated. Another reason is if you use lfcThreshold with a positive threshold, then we do not expect uniform p-values anymore due to the conservativeness of the test construction (see DESeq2 paper).

Thank you very much Bernd and Michael for the detailed explanation and references!

I have indeed seen that the removal of (very) low count rows might in some cases recover a flat shape. In the cases where the hill shape remains evident, I was wondering whether this pvalue procedure is the best way to proceed or this might also lead to "too many" rejections of the null and thus it might in some cases be safer to be more conservative. I have so far tried the correction procedure for only a couple of cases and found that the number of rejections might change dramatically (but I have seen this in a case with very small sample size). I will carefully read your suggested references!

As a side note, I have read in other posts (for ex. here Right skewed histogram of p-values ) that a "right skew" of the histogram can be a sign of a covariate that is not controlled. However, I assumed this refers to cases where the histogram shape has a sort of monotonic increase of frequencies for increasing pvalues, as in the simulated data by Simon in this post.

Fulvia

Hi Fulvia,

yes, a "monotone" histogram might in fact be due to unexplained covariates. In original publication of the method implemented in RUV Seq (http://bioconductor.org/packages/release/bioc/html/RUVSeq.html)

"Using control genes to correct for unwanted variation in microarray data"

http://biostatistics.oxfordjournals.org/content/13/3/539.full

The authors argue that one should remove as many hidden variables / factors as necessary to obtain a "proper" p-value histogram. See their section 2.2. and their figure 2 for an example that is very similar to Simon's on real data with severe batch effects.

Hope that helps,

Bernd

Thanks a lot Bernd for the further suggestions!