Search
Question: Inspection and Correction of p-values in DESeq2 pipeline by Bernd
0
gravatar for deena
8 months ago by
deena0
Germany
deena0 wrote:

Hi Michael,

I following a tutorial from Huber(http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) for detecting Diff.expressed genes using DESeq2. In the section, 10.2 Inspection and correction of p–values  it suggest that if the distribution of raw pvalues follows either hill-shaped histogram or U-shaped hisogram, then the Wald statistic was used re-compute the padj values.

 

I am working on RNAseq data where I compare the Untreated to knock-down of particular gene. Almost in all my comparisons, I have the hill-shaped the histogram in all my comparison so should I use fdr tool as prescribed in the tutorial to correct my padj values?

When I implement this fdr package my results were drastically changed. Before Fdr package implementation it was 61, after FDR package implementation it came to 260. I am confused now to use which method. Kindly guide me

ADD COMMENTlink modified 8 months ago by Bernd Klaus470 • written 8 months ago by deena0
0
gravatar for Michael Love
8 months ago by
Michael Love14k
United States
Michael Love14k wrote:

This is somewhat of a subjective decision, particularly based on plots. I typically don't have the need to use fdrtool to re-calculate the adjusted p-values. It would help if you could add plots, and also all your code. Note that you should note use fdrtool in combination with lfcThreshold > 0.

ADD COMMENTlink written 8 months ago by Michael Love14k
HI Michael, Like you said I have attached the the histogram of pvalues and the R code data = read.delim("raw_counts.txt", row.names=1) rnaseqMatrix = as.matrix(round(data)) conditions = data.frame(conditions=factor(c(rep("Control", 3), rep("gene_x_kd", 3), rep("gene_y_kd", 3), rep("genexy_kd", 3)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix(countData = rnaseqMatrix,colData = conditions,design = ~ conditions) dds = DESeq(ddsFullCountTable) res_gene_x_vs_Control = results(dds,contrast = c("conditions","gene_x_kd","Control")) res_gene_y_vs_Control = results(dds,contrast= c("conditions","gene_y_kd","Control")) res_gene_xy_vs_Control = results(dds,contrast = c("conditions","genexy_kd","Control")) hist(res_gene_x_vs_Control$pvalue, col = "lavender",main="Gene_X_KD_vs_Control", xlab = "p-values") hist(res_gene_y_vs_Control$pvalue, col = "lavender",main="Gene_Y_KD_vs_Control", xlab = "p-values") hist(res_gene_xy_vs_Control$pvalue,main="GeneXY_KD_vs_Control", col = "lavender", xlab = "p-values") Let me know if you need any more details. Kindly guide me. Thanks in advance On Sat, Mar 4, 2017 at 8:37 PM, Michael Love [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Michael Love <https: support.bioconductor.org="" u="" 5822=""/> wrote Answer: > Inspection and Correction of p-values in DESeq2 pipeline by Bernd > <https: support.bioconductor.org="" p="" 93333="" #93360="">: > > This is somewhat of a subjective decision, particularly based on plots. I > typically don't have the need to use fdrtool to re-calculate the adjusted > p-values. It would help if you could add plots, and also all your code. > Note that you should note use fdrtool in combination with lfcThreshold > 0. > > ------------------------------ > > Post tags: deseq2, FDR, fdrtool > > You may reply via email or visit https://support.bioconductor. > org/p/93333/#93360 >
ADD REPLYlink written 8 months ago by deena0

As Mike said, this is a quite subjective decision. If the p-value is hill shaped, the p-values are on avarage "higher than expected" leading to a potentially to a lower power.

However, Wolfgang pointed me to the fact that these histogram shapes can result from batch effects in your data that overlap experimental groups. See the following simulated example, where samples 6-15 are (6-10 are in group 1, 11-15 are in group two) are affected by the same batch effect.

library(tidyverse)
library(genefilter)

n <- 10000
m <- 20
x <- matrix(rnorm(n*m), nrow = n, ncol = m)
fac <- factor(c(rep(0, 10), rep(1, 10)))
rt1 <- rowttests(x, fac)

qplot(rt1$p.value, fill = I("tan3"))

x[, 6:15] <- x[, 6:15]+5
rt2 <- rowttests(x, fac)

qplot(rt2$p.value, fill = I("coral3"))

Therefore I suggest that you take a look at the PCA plot to try to see whether  you have any clustering by batches and then try to see whether, when you run the DE analysis between batches, you get  "nice" p-value histograms.

Best wishes,

Bernd

 

ADD REPLYlink modified 8 months ago • written 8 months ago by Bernd Klaus470

HI Bernd,

I tried to find batch effect on samples but it didnt seems to be like presence of batch effect. The historgrams of specific comparisons of a particular time point follows hill-shaped structure even after the fdr tool correction. The z-score computed for that specific results are z-Score(sd: 0.644; eta0= 0.9). Now what does that imply, is there any problem with the samples? The number significant(padj <0.05) of differential expressed genes for this specific comparison was around 50. Will few number of diff. genes also influence hill shaped histogram? Kindly guide me as this is the first time I am facing this type of problem

ADD REPLYlink modified 8 months ago • written 8 months ago by deena0
0
gravatar for Bernd Klaus
8 months ago by
Bernd Klaus470
Germany
Bernd Klaus470 wrote:

If you are sure that there are no batch effects, try to run a stronger prefiltering of your data, e.g.

 rowSums(counts(dds)) > 20 

or something like that since you had spikes near 1 in the p-value histograms you sent to my by mail. The fdrtool results cannot directly indicate whether there is a problem with the samples, but it might be confused by the spikes near 1. The number of differentially expressed genes will influence the et0 that FDR tool estimates. An eta0 of 90% means that 90% of your genes are probably not differentially expressed.

 

 

ADD COMMENTlink written 8 months ago by Bernd Klaus470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 145 users visited in the last hour