Interpreting the results from lfcShrink() with apeglm in DESeq2
Entering edit mode
Last seen 4 months ago

Dear all,

I'm relatively new to differential expression analysis of RNA-Seq data, and I'm working my way through the DESeq2 vignette to come to terms with how it works. I'm looking to obtain a set of genes with a log2 fold change >2, and an adjusted p-value of <0.001.

Following the "standard" procedure, I'm able to get the results using the following command:

res_groupA_vs_groupB <- results(dds,contrast=c("Tissue","groupA","groupB"),lfcThreshold=2,alpha=0.001)

This gave me sensible results, including adjusted p-values that I'm comfortable interpreting. I then noticed the lfcShrink function, and read about its benefits. I ran the following command:

res2_groupA_vs_groupB <- lfcShrink(dds,coef=2,type="apeglm",lfcThreshold=2)

In this case, s-values are provided, and from the documentation I can see these "provide the probability of false signs among the tests with equal or smaller s-value than a given given's s-value". I read through the Stephens (2016) reference to try to understand these more, but I'm still a little uncertain as to the interpretation of s-values. As stated above, I've decided a priori to focus on those genes with an adjusted p-value of <0.001, but I'm uncertain whether this same logic can be applied to s-values (i.e., focusing on genes with s-value <0.001). Can the interpretation be analogous, or am I on the wrong track here?

Any advice or suggestions for further reading would be greatly appreciated.

deseq2 apeglm lfcshrink • 1.3k views
Entering edit mode
Last seen 6 hours ago
United States

Hi Charles,

The adjusted p-values and s-values are similar but with a different definition of error. One focuses on falsely rejecting what are truly null genes, and the other on getting the sign of the LFC wrong. You can use whichever you prefer or feel more comfortable with. 

Because we were computing posterior distributions in apeglm, and because we were adding ashr at the same time to lfcShrink, we decided it made sense to provide s-values as output. I think there is one clear benefit to outputting s-values, which is during method development, we can assess and benchmark power and error control simultaneously with real data. It's very difficult to find real datasets in which there are non-null genes and null genes, if we have a point null hypothesis of LFC=0.

Entering edit mode

Hi Michael,

Great- thanks for the clarification and great program!


Login before adding your answer.

Traffic: 367 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6