DESeq2 and fdrtool: applying FDR correction when the empirical sigma is greater than 1
1
1
Entering edit mode
dhibar ▴ 60
@dhibar-10079
Last seen 4.1 years ago

Hello,

I am following the guide found here (http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) to estimate differential gene expression in my two-condition RNAseq dataset. When I look at my histogram of observed p-value statistics I end up with a plot like this:

dds <-  nbinomWaldTest(dds)
ddsRes <- results(dds, pAdjustMethod = "BH")
ddsRes <- ddsRes[ !is.na(ddsRes$padj), ] ddsRes <- ddsRes[ !is.na(ddsRes$pvalue), ]
table(ddsRes[,"padj"] < 0.1) #240 TRUE 19923 FALSE
hist(ddsRes$pvalue, col = "lavender", main = "TRT vs NoTRT", xlab = "p-values") When I run fdrtool on this dataset the estimated empirical sigma is 1.176: FDR.ddsRes <- fdrtool(ddsRes$stat, statistic= "normal", plot = T)
ddsRes[,"padj"]  <- p.adjust(FDR.ddsRes$pval, method = "BH") table(ddsRes[,"padj"] < 0.1) #27 TRUE 20136 FALSE When I look at the number of rejected hypotheses after running the fdrtool procedure the number is much smaller (27 rejections compared to 240 using the BH FDR adjustment directly). I'm wondering: 1) is it appropriate to use the fdrtool procedure when sigma is estimated to be >1? All of the examples I've seen thus far (including in the DESeq tutorials) focus on the case where sigma <1. 2) if the answer to 1) is yes, what are some of the assumptions about the experiment and results that would make a sigma >1 need to be controlled with the more conservative (in this case) fdrtool correction? Any help would be greatly appreciated! deseq2 fdrtool • 1.9k views ADD COMMENT 1 Entering edit mode Bernd Klaus ▴ 590 @bernd-klaus-6281 Last seen 2.7 years ago Germany Hi, thanks for your feedback! 1.) yes, it is appropriate to use fdrtool if sigma > 1. After all, the null model variance can also be underestimated. That being said, keep in mind that the sigma is an *estimate* and judging your p-value histogram, it seems that the original p-vlaue might just be little bit to "liberal" (you seem to have a slight enrichment of low p-values), so you can just keep them as they are. It is not a "must", I only applied emprical null modelling in the example because the p-values clearly had a strange shape. You can also try to use the fdrtool qvalues (= FDR values) directly. What does table(FDR.ddsRes$qval < 0.1) give you ?

Also try locfdr maybe, (https://cran.r-project.org/web/packages/locfdr) ,

which sigma  does locfdr(ddsRes\$stat) give you?

2.) Basically, if fdrtool says that your actual sigma is greater than one, it means that your p-values are  a bit too low on average, so that you might have a higher proportion of false positives than the nominal FDR suggests. You also see this in the histogram, apart from peak at zero, the bars up to a pval of ~ 0.2 are maybe a bit too high be truly uniform.

Hope that helps,

Bernd