Question

DESeq2 and fdrtool: applying FDR correction when the empirical sigma is greater than 1

1

Entering edit mode

dhibar ▴ 60

@dhibar-10079

Last seen 7.0 years ago

Hello,

I am following the guide found here (http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) to estimate differential gene expression in my two-condition RNAseq dataset. When I look at my histogram of observed p-value statistics I end up with a plot like this:

dds <- nbinomWaldTest(dds)
ddsRes <- results(dds, pAdjustMethod = "BH")
ddsRes <- ddsRes[ !is.na(ddsRes$padj), ]
ddsRes <- ddsRes[ !is.na(ddsRes$pvalue), ]
table(ddsRes[,"padj"] < 0.1) #240 TRUE 19923 FALSE
hist(ddsRes$pvalue, col = "lavender", main = "TRT vs NoTRT", xlab = "p-values")

When I run fdrtool on this dataset the estimated empirical sigma is 1.176:

FDR.ddsRes <- fdrtool(ddsRes$stat, statistic= "normal", plot = T)
ddsRes[,"padj"] <- p.adjust(FDR.ddsRes$pval, method = "BH")
table(ddsRes[,"padj"] < 0.1) #27 TRUE 20136 FALSE

When I look at the number of rejected hypotheses after running the fdrtool procedure the number is much smaller (27 rejections compared to 240 using the BH FDR adjustment directly).

I'm wondering:

1) is it appropriate to use the fdrtool procedure when sigma is estimated to be >1? All of the examples I've seen thus far (including in the DESeq tutorials) focus on the case where sigma <1.

2) if the answer to 1) is yes, what are some of the assumptions about the experiment and results that would make a sigma >1 need to be controlled with the more conservative (in this case) fdrtool correction?

Any help would be greatly appreciated!

deseq2 fdrtool • 3.5k views

ADD COMMENT • link updated 7.9 years ago by Bernd Klaus ▴ 610 • written 7.9 years ago by dhibar ▴ 60

score 1 · Answer 1 · 2016-06-15

Hi,

thanks for your feedback!

1.) yes, it is appropriate to use fdrtool if sigma > 1. After all, the null model variance can also be underestimated. That being said, keep in mind that the sigma is an *estimate* and judging your p-value histogram, it seems that the original p-vlaue might just be little bit to "liberal" (you seem to have a slight enrichment of low p-values), so you can just keep them as they are. It is not a "must", I only applied emprical null modelling in the example because the p-values clearly had a strange shape.

You can also try to use the fdrtool qvalues (= FDR values) directly. What does

table(FDR.ddsRes$qval < 0.1) give you ?

Also try locfdr maybe, (https://cran.r-project.org/web/packages/locfdr) ,

which sigma does locfdr(ddsRes$stat) give you?

2.) Basically, if fdrtool says that your actual sigma is greater than one, it means that your p-values are a bit too low on average, so that you might have a higher proportion of false positives than the nominal FDR suggests. You also see this in the histogram, apart from peak at zero, the bars up to a pval of ~ 0.2 are maybe a bit too high be truly uniform.

Hope that helps,

Bernd