Hi Michael,
I following a tutorial from Huber(http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) for detecting Diff.expressed genes using DESeq2. In the section, 10.2 Inspection and correction of p–values it suggest that if the distribution of raw pvalues follows either hill-shaped histogram or U-shaped hisogram, then the Wald statistic was used re-compute the padj values.
I am working on RNAseq data where I compare the Untreated to knock-down of particular gene. Almost in all my comparisons, I have the hill-shaped the histogram in all my comparison so should I use fdr tool as prescribed in the tutorial to correct my padj values?
When I implement this fdr package my results were drastically changed. Before Fdr package implementation it was 61, after FDR package implementation it came to 260. I am confused now to use which method. Kindly guide me
As Mike said, this is a quite subjective decision. If the p-value is hill shaped, the p-values are on avarage "higher than expected" leading to a potentially to a lower power.
However, Wolfgang pointed me to the fact that these histogram shapes can result from batch effects in your data that overlap experimental groups. See the following simulated example, where samples 6-15 are (6-10 are in group 1, 11-15 are in group two) are affected by the same batch effect.
library(tidyverse) library(genefilter)
n <- 10000 m <- 20 x <- matrix(rnorm(n*m), nrow = n, ncol = m) fac <- factor(c(rep(0, 10), rep(1, 10))) rt1 <- rowttests(x, fac)
qplot(rt1$p.value, fill = I("tan3"))
x[, 6:15] <- x[, 6:15]+5 rt2 <- rowttests(x, fac)
qplot(rt2$p.value, fill = I("coral3"))
Therefore I suggest that you take a look at the PCA plot to try to see whether you have any clustering by batches and then try to see whether, when you run the DE analysis between batches, you get "nice" p-value histograms.
Best wishes,
HI Bernd,
I tried to find batch effect on samples but it didnt seems to be like presence of batch effect. The historgrams of specific comparisons of a particular time point follows hill-shaped structure even after the fdr tool correction. The z-score computed for that specific results are z-Score(sd: 0.644; eta0= 0.9). Now what does that imply, is there any problem with the samples? The number significant(padj <0.05) of differential expressed genes for this specific comparison was around 50. Will few number of diff. genes also influence hill shaped histogram? Kindly guide me as this is the first time I am facing this type of problem