Search
Question: Inspection and Correction of p-values in DESeq2 pipeline by Bernd
1
19 months ago by
deena10
Germany
deena10 wrote:

Hi Michael,

I following a tutorial from Huber(http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) for detecting Diff.expressed genes using DESeq2. In the section, 10.2 Inspection and correction of p–values  it suggest that if the distribution of raw pvalues follows either hill-shaped histogram or U-shaped hisogram, then the Wald statistic was used re-compute the padj values.

I am working on RNAseq data where I compare the Untreated to knock-down of particular gene. Almost in all my comparisons, I have the hill-shaped the histogram in all my comparison so should I use fdr tool as prescribed in the tutorial to correct my padj values?

When I implement this fdr package my results were drastically changed. Before Fdr package implementation it was 61, after FDR package implementation it came to 260. I am confused now to use which method. Kindly guide me

modified 19 months ago by Bernd Klaus520 • written 19 months ago by deena10
0
19 months ago by
Michael Love19k
United States
Michael Love19k wrote:

This is somewhat of a subjective decision, particularly based on plots. I typically don't have the need to use fdrtool to re-calculate the adjusted p-values. It would help if you could add plots, and also all your code. Note that you should note use fdrtool in combination with lfcThreshold > 0.

HI Michael, Like you said I have attached the the histogram of pvalues and the R code data = read.delim("raw_counts.txt", row.names=1) rnaseqMatrix = as.matrix(round(data)) conditions = data.frame(conditions=factor(c(rep("Control", 3), rep("gene_x_kd", 3), rep("gene_y_kd", 3), rep("genexy_kd", 3)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix(countData = rnaseqMatrix,colData = conditions,design = ~ conditions) dds = DESeq(ddsFullCountTable) res_gene_x_vs_Control = results(dds,contrast = c("conditions","gene_x_kd","Control")) res_gene_y_vs_Control = results(dds,contrast= c("conditions","gene_y_kd","Control")) res_gene_xy_vs_Control = results(dds,contrast = c("conditions","genexy_kd","Control")) hist(res_gene_x_vs_Control$pvalue, col = "lavender",main="Gene_X_KD_vs_Control", xlab = "p-values") hist(res_gene_y_vs_Control$pvalue, col = "lavender",main="Gene_Y_KD_vs_Control", xlab = "p-values") hist(res_gene_xy_vs_Control$pvalue,main="GeneXY_KD_vs_Control", col = "lavender", xlab = "p-values") Let me know if you need any more details. Kindly guide me. Thanks in advance On Sat, Mar 4, 2017 at 8:37 PM, Michael Love [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Michael Love <https: support.bioconductor.org="" u="" 5822=""/> wrote Answer: > Inspection and Correction of p-values in DESeq2 pipeline by Bernd > <https: support.bioconductor.org="" p="" 93333="" #93360="">: > > This is somewhat of a subjective decision, particularly based on plots. I > typically don't have the need to use fdrtool to re-calculate the adjusted > p-values. It would help if you could add plots, and also all your code. > Note that you should note use fdrtool in combination with lfcThreshold > 0. > > ------------------------------ > > Post tags: deseq2, FDR, fdrtool > > You may reply via email or visit https://support.bioconductor. > org/p/93333/#93360 > ADD REPLYlink written 19 months ago by deena10 As Mike said, this is a quite subjective decision. If the p-value is hill shaped, the p-values are on avarage "higher than expected" leading to a potentially to a lower power. However, Wolfgang pointed me to the fact that these histogram shapes can result from batch effects in your data that overlap experimental groups. See the following simulated example, where samples 6-15 are (6-10 are in group 1, 11-15 are in group two) are affected by the same batch effect. library(tidyverse) library(genefilter)  n <- 10000 m <- 20 x <- matrix(rnorm(n*m), nrow = n, ncol = m) fac <- factor(c(rep(0, 10), rep(1, 10))) rt1 <- rowttests(x, fac) qplot(rt1$p.value, fill = I("tan3"))

x[, 6:15] <- x[, 6:15]+5 rt2 <- rowttests(x, fac)

qplot(rt2\$p.value, fill = I("coral3"))

Therefore I suggest that you take a look at the PCA plot to try to see whether  you have any clustering by batches and then try to see whether, when you run the DE analysis between batches, you get  "nice" p-value histograms.

Best wishes,

Bernd

HI Bernd,

I tried to find batch effect on samples but it didnt seems to be like presence of batch effect. The historgrams of specific comparisons of a particular time point follows hill-shaped structure even after the fdr tool correction. The z-score computed for that specific results are z-Score(sd: 0.644; eta0= 0.9). Now what does that imply, is there any problem with the samples? The number significant(padj <0.05) of differential expressed genes for this specific comparison was around 50. Will few number of diff. genes also influence hill shaped histogram? Kindly guide me as this is the first time I am facing this type of problem

0
19 months ago by
Bernd Klaus520
Germany
Bernd Klaus520 wrote:

If you are sure that there are no batch effects, try to run a stronger prefiltering of your data, e.g.

 rowSums(counts(dds)) > 20


or something like that since you had spikes near 1 in the p-value histograms you sent to my by mail. The fdrtool results cannot directly indicate whether there is a problem with the samples, but it might be confused by the spikes near 1. The number of differentially expressed genes will influence the et0 that FDR tool estimates. An eta0 of 90% means that 90% of your genes are probably not differentially expressed.