Question

How to set up the Wald test testing for log2 fold change above a threshold, and interpret the result correctly?

1

Entering edit mode

Alan ▴ 20

@alan-15011

Last seen 5.9 years ago

Hi Michael and community,

My question this time is about how to set up the Wald test log2 fold change above a threshold, and how to interpret the result correctly.

#1.
Take the pasilla dataset (in the DESeq2 vignette) as an example, if I am interested only in genes that show |log2 FC| >2, and I want to control for FDR <0.05, then I write:
dds <- DESeq(dds)
res <- results(dds, alpha = 0.05, lfcThreshold = 2, altHypothesis="greaterAbs")
resLFC <- lfcShrink(dds, coef = 2, res=res, lfcThreshold=2)

Is above code correct? In addition, do I need to put the lfcThreshold=2 argument in both results() step, and the lfcShrink() step?

#2.
I tried it on my computer, and the first few lines of the "resLFC" object (in the order of smallest FDR) look like below:
         baseMean    log2FoldChange     lfcSE      stat      pvalue       padj
FBgn0039155   730.567673   -3.686017982 0.12624706 -15.483906 4.455761e-54 4.420115e-50
FBgn0039827   261.911170   -2.841617881 0.14698260 -9.296996 1.444700e-20 7.165711e-17
FBgn0003360 4342.832062   -2.662356807 0.12047274 -8.215957 2.104796e-16 6.959859e-13
FBgn0025111 1501.447930    2.528934547 0.11059930   7.066198 1.592366e-12 3.893356e-09
FBgn0034736   225.870672   -2.492829764 0.14465009 -7.037131 1.962377e-12 3.893356e-09
FBgn0034434   114.623344   -2.227108812 0.15598919 -5.900835 3.616673e-09 5.310345e-06
FBgn0085359    68.606108   -1.856196452 0.16544745 -5.894982 3.747219e-09 5.310345e-06
FBgn0024288    58.849488   -1.911765311 0.16493679 -5.563957 2.637244e-08 3.270182e-05

How to interpret the last two genes(FBgn0085359, FBgn0024288) here? I mean, they both had significant FDR, but their |log2 FC| is not above my threshold of 2. So are they still significant DE genes?

#3.
I've studied the paper and the vignette, from them, I get the idea that log2 FC shrinkage is to overcome the low count-high variance issue, and is more preferable for ranking and visualization. So in a formal report(eg. publication), we should report the shrunken log2 FC, not the original log2 FC, is this correct?

As always, thank you so much for your time and explanation, Michael!

Alan

R version 3.4.3 (2017-11-30), DESeq2_1.18.1

deseq2 • 1.6k views

ADD COMMENT • link updated 6.8 years ago by Michael Love 43k • written 6.8 years ago by Alan ▴ 20

score 1 · Answer 1 · 2018-02-16

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

hi Alan,

These are good questions. I only just added lfcThreshold to lfcShrink in the development branch (v1.19), so it won't be out in the release until April. It will work with type="normal" and type="apeglm". In version 1.18 it is not used by lfcShrink().

The reason those two have smaller |LFC| than 2 is because type="normal" is a bit too aggressive about shrinkage. Those genes likely do have |LFC| > 2. Note that, type="normal" is still good for ranking genes by effect size, better than the MLE LFCs or LFCs computed with pseudocounts according to our testing. [MLE = maximum likelihood estimate, this is the one being produced by DESeq() now, before running lfcShrink().] The newer method, type="apeglm", seems better all around, good for ranking genes by effect size and accurate at producing the LFC point estimate, and we are working on a manuscript to describe the new method and show its performance. Also you can use type="ashr" which also will not overly shrink the larger LFCs.

You can really choose what you like for the report. I'd recommend the two newer shrinkage methods (apeglm or ashr), as they are accurate for the large LFCs and have much less error on the imprecise LFCs than the MLE.

ADD COMMENT • link 6.8 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thank you very much indeed! A few follow up questions on this same LFC shrinkage issue: (I decide to go with apeglm after some reading)
#1.
Just to clarify: again, take the pasilla dataset as an example, if I wanna use the apeglm method to shrink LFC, and I am interested only in genes that show |log2 FC| >2, plus controling for FDR <0.05, is this the correct way to code?
dds <- DESeq(dds)
res <- results(dds, alpha = 0.05, lfcThreshold = 2, altHypothesis="greaterAbs")
resApe <- lfcShrink(dds, coef=2, res=res, type="apeglm")

#2. If I go with the apeglm method to shrink LFC, then it's not possible to get genes with FDR < 0.05 that have |LFC| <2, right? (at least for the above code with pasilla dataset, all FDR < 0.05 genes have |LFC| >2...)

#3. When using the lfcShrink() function, if I wanna keep the FDR throeshold of 0.05, then I need to include the res = res argument, right? (because I noted that if I do not include this argument, then the summary(resApe) would still show "adjusted p-value < 0.1")

Thank you for your advice and help!

Alan

ADD REPLY • link 6.8 years ago Alan ▴ 20

1

Entering edit mode

I'll try to write it again:

1) yes, but in 1.20 you can use lfcThreshold in lfcShrink, and it will compute statistics directly on the posterior distribution for the LFC.

2) in theory it could happen, but practically i think it won't, because apeglm preserves large LFCs. And following the answer in (1), it would never happen if you computed statistics on the posterior LFC.

3) Yes you should pass res=res for customizing the results call, because if you don't lfcShrink uses default arguments to results internally.

ADD REPLY • link 6.8 years ago Michael Love 43k

0

Entering edit mode

Thank you very much, Michael! I am very sorry for making you write again. I used another computer at first, and so I posted again. Thank you for your patience and explanation! -Alan