Question: DESEq2 p-values weird distribution under the H0 and fdrtool
gravatar for thyagoleal
4 months ago by
thyagoleal20 wrote:

I'm analyzing an RNAseq with the following samples/groups: 

A = 14 samples, B = 7, C = 7, D = 10 and E = 10.

Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X. 

I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette. 

Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).

As suggested by Klaus (, I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code: <-$stat, statistic = "normal")

Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray. 

My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?

The histograms:

Top, before correction with fdrtool. Bottom, after fdrtool code above. 

P-values before

After FDR correction








Edit.2. Added PCA from DESeq2::plotPCA()


From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.



ADD COMMENTlink modified 4 months ago • written 4 months ago by thyagoleal20

dear Thyago,

A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().

ADD REPLYlink written 4 months ago by Michael Love15k

Dear Michael,

Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc. 

Thank you.

ADD REPLYlink modified 4 months ago • written 4 months ago by thyagoleal20
gravatar for Bernd Klaus
4 months ago by
Bernd Klaus480
Bernd Klaus480 wrote:

Hi Thyago,


The PCA plot looks pretty much like a batch effect being a stronger "separator" than the experimental groups.

You should tackle this directly. After application of fdrtool, you move to a rather bathtub shaped p-value histogram which often indicates p-values that are too liberal (you have ~ 10k pvals in the smallest bin, setting the y-limit to 3000 or so should enable to you to see the bathtub more clearly).

I bet if you test all samples with PC1 < 0 against all with PC1 > 0 you will have a nicer p-value histogram. Can you check this?

Try to run DESeq with at least one surrogate variable, the RNA-Seq gene workflow tells you how to do this:

What do you get?

Alternatively, you could simply try to include PC1 as a covariate in your DESeq model.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Bernd Klaus480
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 132 users visited in the last hour