Question: DESeq2 1.20.0 lfcshrink function
1
13 months ago by
emi.sch10
emi.sch10 wrote:

Dear all,

I would appreciate your feedback on this one. I have been using DESeq2(version 1.20.0) for differential expression analysis comparing response in cancer patients.

My question is about the lfcShrink() function. There have been some updates on the function from the previous versions of the package and I have read the documentation and many posts in here, but I have still some doubts on how I do my analysis.

I wanted to be strict strict so I set both thresholds for padj as well as log fold change:

padj.cutoff <- 0.05
lfc.cutoff <- 0.58

dds <- DESeq(dds)

res <- results(dds, contrast=c("response","PD","CR+PR"),
lfcThreshold = lfc.cutoff, alpha = padj.cutoff,
altHypothesis = "greaterAbs")

res_setThres <- lfcShrink(dds=dds, contrast=c("response","PD","CR+PR"),
res=res, lfcThreshold = lfc.cutoff)

res_noThres <- lfcShrink(dds=dds, contrast=c("response","PD","CR+PR"), res=res)

My question is: should I set the threshold for LFC inside the lfcShrink() function too? In the res_setThres the p-values have changed (as mentioned in the documentation) and I eventually get no DEGs based on these thresholds. Even when I increased padj to 0.1 I still got zero significant DEGs. My res_noThres table produces a small set of genes for these thresholds. Is it wrong if I use the lfcSchrink without setting the lfcThreshold and then subset my results table to those genes passing the filters? I know it is important  to set these thresholds in the results() function and not filter post-hoc, I was wondering if that applies to lfcshrink too.

Thank you!

modified 13 days ago by yusuf.zhc0 • written 13 months ago by emi.sch10
11
13 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Good question. I can see how it's a bit confusing, as we have two inference frameworks now, one more traditional one around p-values with BH adjustment, augmented with IHW, and a new one around posterior estimates of effect sizes and "s-values", which is interfaced via lfcShrink().

It will help me explain things to first walk through some example code. If we make some data, and run DESeq():

dds <- makeExampleDESeqDataSet(betaSD=1)
dds <- DESeq(dds)

We can test against an LFC threshold of 1, from results():

res <- results(dds, lfcThreshold=1)

And we can find a set of genes where we reject the null that |beta| < 1 with an FDR bound of 0.1 (or 0.05 as you have above, specifying alpha).

As you say, this is not the same as the set of shrunken LFC > 1, just because they are different statistical procedures:

lfc1 <- lfcShrink(dds, coef=2, type="apeglm") # apeglm is now recommended

Although there is some overlap:

> table(test=res$padj < .1, shrink=lfc1$log2FoldChange > 1)
shrink
test    FALSE TRUE
FALSE   426   69
TRUE      5   15

And it's also not the same as setting lfcThreshold within lfcShrink, which produces "s-values", although note that this set is more similar:

> lfc2 <- lfcShrink(dds, coef=2, lfcThreshold=1, type="apeglm")
> table(test=res$padj < .1, svalue=lfc2$svalue < .005) # we use a smaller probability for s-value
svalue
test    FALSE TRUE
FALSE   486    9
TRUE      0   20

Or we can use the older version of shrinkage, called "normal", which is more strict than the first set of results.

> lfc3 <- lfcShrink(dds, coef=2, lfcThreshold=1, type="normal")
> table(test=res$padj < .1, old=lfc3$padj < .1)
old
test    FALSE TRUE
FALSE   495    0
TRUE      6   14

"My question is: should I set the threshold for LFC inside the lfcShrink() function too? In the res_setThres the p-values have changed (as mentioned in the documentation) and I eventually get no DEGs based on these thresholds."

In your code you are using type="normal", which we have seen can shrink large effect sizes too much, and so when you then perform an LFC threshold test, it is being too conservative. We have a new procedure called "apeglm" which does better than "normal" across many datasets, in particular on the larger effect sizes.

If you want to stick with an FDR bounded set, and have the posterior estimates of LFC, I would recommend to use the following (where you can include your additional arguments in "..."):

res <- results(dds, lfcThreshold=1, ...)
res <- lfcShrink(dds, coef="response_PD_vs_CR+PR", type="apeglm", res=res)

This will keep the p-values and padj from the results() call, and simply update the LFCs so they are posterior estimates.

Alternatively, you can instead use the s-value framework, which produces a set where we bound the expected number of LFCs that will have a false sign or be smaller in magnitude than 1. I recommend to use a smaller probability threshold on the s-values produced here, such as 0.005:

lfc <- lfcShrink(dds, coef="response_PD_vs_CR+PR", lfcThreshold=1, type="apeglm")

Hi Michael,

Thank you for your answer, but I'm still a little confused and have a question as described below:

By calling res <- results(dds, lfcThreshold=1, ...), we test whether the absolute LFC is larger than 1, but after shrinking LCF using res <- lfcShrink(dds, coef="response_PD_vs_CR+PR", type="apeglm", res=res), some genes with an original |LFC| > 1 and padj < 0.05 will now have a |LFC| less than 1. Are those genes still significant? Are we going to use the shrunken LFC and p/padj values for downstream analysis even though the p/padj values were based on the original LFC and do not match the shrunken LFC?

I really appreciate it if you would answer my question. Thank you very much.

You’re always going to find a few inconsistencies if you switch between the two frameworks, why not just stick with the Bayesian shrinkage framework then (skip the results function)

dds <- DESeq(dds)

res <- lfcShrink(dds, coef=coef, lfcThreshold=1)

summary(res)

Thank you for your prompt response, Michael.

My concern is that res <- lfcShrink(dds, coef=coef, lfcThreshold=1)will provide s values, which are less commonly used than p/padj values. I'm afraid that this may cause some confusion during the peer review process.

I'm considering skipping lfcShrink and just use LFC generated by the results function. Does this approach make sense to you? Thanks very much!

1

It’s really up to you. I don’t see an inconsistency between a thresholded test of an unshrunken LFC along with a shrunken LFC, which in a minority of cases may be less than the threshold. They are different procedures and estimators for different purposes, both useful output in my opinion.

I was wrong. lfcShrink will produce padj values. However, lfcShrink doesn't allow for specifying alpha values other than 0.1, whereas the results function does. Can I do res <- lfcShrink(dds, coef=coef, lfcThreshold=1) and then filter significant genes by padj < 0.05? I ask because I saw you mentioned in another post that choosing a different value for alpha than the one you use to filter is sub-optimal.

Would you please tell me which approach would be better? Use results with alpha = 0.05 or use lfcShrink and then filter by padj=0.05?

Thank you very much!

I see, there's no access to the 'alpha' argument of results() which is called inside lfcShrink. Although I have said elsewhere that it's optimal to pick the same 'alpha' that you threshold on, it's not that big of a deal between 5% and 10%. I would go with using lfcShrink and filtering on padj=.05.

Thank you very much, Michael.

0
13 days ago by
yusuf.zhc0 wrote:

res1LFC <- lfcShrink(dds, coef="time12vs_0", type="apeglm", padj=.05)

0
13 days ago by
yusuf.zhc0 wrote:

res1LFC <- lfcShrink(dds, coef="conditiontreatedvs_untreated", type="apeglm", padj=.05)