DESEq2 p-values weird distribution under the H0 and fdrtool
Entering edit mode
tcalvo • 40
Last seen 1 hour ago

I'm analyzing an RNAseq with the following samples/groups: 

A = 14 samples, B = 7, C = 7, D = 10 and E = 10.

Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X. 

I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette. 

Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).

As suggested by Klaus (, I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code: <-$stat, statistic = "normal")

Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray. 

My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?

The histograms:

Top, before correction with fdrtool. Bottom, after fdrtool code above. 

P-values before

After FDR correction








Edit.2. Added PCA from DESeq2::plotPCA()


From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.



rnaseq deseq2 fdrtool differential gene expression • 1.6k views
Entering edit mode

dear Thyago,

A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().

Entering edit mode

Dear Michael,

Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc. 

Thank you.

Entering edit mode
Bernd Klaus • 590
Last seen 2.1 years ago

Hi Thyago,


The PCA plot looks pretty much like a batch effect being a stronger "separator" than the experimental groups.

You should tackle this directly. After application of fdrtool, you move to a rather bathtub shaped p-value histogram which often indicates p-values that are too liberal (you have ~ 10k pvals in the smallest bin, setting the y-limit to 3000 or so should enable to you to see the bathtub more clearly).

I bet if you test all samples with PC1 < 0 against all with PC1 > 0 you will have a nicer p-value histogram. Can you check this?

Try to run DESeq with at least one surrogate variable, the RNA-Seq gene workflow tells you how to do this:

What do you get?

Alternatively, you could simply try to include PC1 as a covariate in your DESeq model.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3