Question

Interpreting the results from lfcShrink() with apeglm in DESeq2

0

Entering edit mode

charles.foster ▴ 160

@charlesfoster-17652

Last seen 13 months ago

Australia

Dear all,

I'm relatively new to differential expression analysis of RNA-Seq data, and I'm working my way through the DESeq2 vignette to come to terms with how it works. I'm looking to obtain a set of genes with a log2 fold change >2, and an adjusted p-value of <0.001.

Following the "standard" procedure, I'm able to get the results using the following command:

res_groupA_vs_groupB <- results(dds,contrast=c("Tissue","groupA","groupB"),lfcThreshold=2,alpha=0.001)

This gave me sensible results, including adjusted p-values that I'm comfortable interpreting. I then noticed the lfcShrink function, and read about its benefits. I ran the following command:

res2_groupA_vs_groupB <- lfcShrink(dds,coef=2,type="apeglm",lfcThreshold=2)

In this case, s-values are provided, and from the documentation I can see these "provide the probability of false signs among the tests with equal or smaller s-value than a given given's s-value". I read through the Stephens (2016) reference to try to understand these more, but I'm still a little uncertain as to the interpretation of s-values. As stated above, I've decided a priori to focus on those genes with an adjusted p-value of <0.001, but I'm uncertain whether this same logic can be applied to s-values (i.e., focusing on genes with s-value <0.001). Can the interpretation be analogous, or am I on the wrong track here?

Any advice or suggestions for further reading would be greatly appreciated.

deseq2 apeglm lfcshrink • 2.5k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 41k • written 5.5 years ago by charles.foster ▴ 160

score 4 · Accepted Answer · 2018-10-05

Hi Charles,

The adjusted p-values and s-values are similar but with a different definition of error. One focuses on falsely rejecting what are truly null genes, and the other on getting the sign of the LFC wrong. You can use whichever you prefer or feel more comfortable with.

Because we were computing posterior distributions in apeglm, and because we were adding ashr at the same time to lfcShrink, we decided it made sense to provide s-values as output. I think there is one clear benefit to outputting s-values, which is during method development, we can assess and benchmark power and error control simultaneously with real data. It's very difficult to find real datasets in which there are non-null genes and null genes, if we have a point null hypothesis of LFC=0.