DESEq2 p-values weird distribution under the H0 and fdrtool
0
Entering edit mode
tcalvo • 40
@tcalvo-12466
Last seen 1 hour ago

I'm analyzing an RNAseq with the following samples/groups:

A = 14 samples, B = 7, C = 7, D = 10 and E = 10.

Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X.

I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette.

Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).

As suggested by Klaus (http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf), I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code:

res.de.fixed <- fdrtoolres.de\$stat, statistic = "normal")

Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray.

My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?

The histograms:

Top, before correction with fdrtool. Bottom, after fdrtool code above.

From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.

Thanks.

Thyago

rnaseq deseq2 fdrtool differential gene expression • 1.6k views
0
Entering edit mode

dear Thyago,

A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().

0
Entering edit mode

Dear Michael,

Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc.

Thank you.

2
Entering edit mode
Bernd Klaus • 590
@Bernd-Klaus-6281
Last seen 2.1 years ago
Germany

Hi Thyago,

The PCA plot looks pretty much like a batch effect being a stronger "separator" than the experimental groups.

You should tackle this directly. After application of fdrtool, you move to a rather bathtub shaped p-value histogram which often indicates p-values that are too liberal (you have ~ 10k pvals in the smallest bin, setting the y-limit to 3000 or so should enable to you to see the bathtub more clearly).

I bet if you test all samples with PC1 < 0 against all with PC1 > 0 you will have a nicer p-value histogram. Can you check this?

Try to run DESeq with at least one surrogate variable, the RNA-Seq gene workflow tells you how to do this:

http://bioconductor.org/help/workflows/rnaseqGene/#removing-hidden-batch-effects

What do you get?

Alternatively, you could simply try to include PC1 as a covariate in your DESeq model.