Search
Question: DESEq2 p-values weird distribution under the H0 and fdrtool
0
gravatar for thyagoleal
12 weeks ago by
thyagoleal20
thyagoleal20 wrote:

I'm analyzing an RNAseq with the following samples/groups: 

A = 14 samples, B = 7, C = 7, D = 10 and E = 10.

Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X. 

I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette. 

Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).

As suggested by Klaus (http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf), I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code:

res.de.fixed <- fdrtoolres.de$stat, statistic = "normal")

Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray. 

My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?

The histograms:

Top, before correction with fdrtool. Bottom, after fdrtool code above. 

P-values before

After FDR correction

 

 

 

 

 

 

 

Edit.2. Added PCA from DESeq2::plotPCA()

PCA

From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.

Thanks.

Thyago

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by thyagoleal20

dear Thyago,

A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().

ADD REPLYlink written 12 weeks ago by Michael Love14k

Dear Michael,

Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc. 

Thank you.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by thyagoleal20
1
gravatar for Bernd Klaus
12 weeks ago by
Bernd Klaus470
Germany
Bernd Klaus470 wrote:

Hi Thyago,

 

The PCA plot looks pretty much like a batch effect being a stronger "separator" than the experimental groups.

You should tackle this directly. After application of fdrtool, you move to a rather bathtub shaped p-value histogram which often indicates p-values that are too liberal (you have ~ 10k pvals in the smallest bin, setting the y-limit to 3000 or so should enable to you to see the bathtub more clearly).

I bet if you test all samples with PC1 < 0 against all with PC1 > 0 you will have a nicer p-value histogram. Can you check this?

Try to run DESeq with at least one surrogate variable, the RNA-Seq gene workflow tells you how to do this:

http://bioconductor.org/help/workflows/rnaseqGene/#removing-hidden-batch-effects

What do you get?

Alternatively, you could simply try to include PC1 as a covariate in your DESeq model.

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Bernd Klaus470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 156 users visited in the last hour