How to set up the Wald test testing for log2 fold change above a threshold, and interpret the result correctly?
1
1
Entering edit mode
Alan ▴ 20
@alan-15011
Last seen 5.9 years ago

Hi Michael and community,

My question this time is about how to set up the Wald test log2 fold change above a threshold, and how to interpret the result correctly.

#1.
Take the pasilla dataset (in the DESeq2 vignette) as an example, if I am interested only in genes that show |log2 FC| >2, and I want to control for FDR <0.05, then I write:
  dds <- DESeq(dds)
  res <- results(dds, alpha = 0.05, lfcThreshold = 2, altHypothesis="greaterAbs")
  resLFC <- lfcShrink(dds, coef = 2, res=res, lfcThreshold=2)

Is above code correct? In addition, do I need to put the lfcThreshold=2 argument in both results() step, and the lfcShrink() step?


#2.
I tried it on my computer, and the first few lines of the "resLFC" object (in the order of smallest FDR) look like below:
                         baseMean    log2FoldChange     lfcSE        stat          pvalue           padj
FBgn0039155   730.567673   -3.686017982 0.12624706 -15.483906 4.455761e-54 4.420115e-50
FBgn0039827   261.911170   -2.841617881 0.14698260  -9.296996 1.444700e-20 7.165711e-17
FBgn0003360  4342.832062   -2.662356807 0.12047274  -8.215957 2.104796e-16 6.959859e-13
FBgn0025111  1501.447930    2.528934547 0.11059930   7.066198 1.592366e-12 3.893356e-09
FBgn0034736   225.870672   -2.492829764 0.14465009  -7.037131 1.962377e-12 3.893356e-09
FBgn0034434   114.623344   -2.227108812 0.15598919  -5.900835 3.616673e-09 5.310345e-06
FBgn0085359    68.606108   -1.856196452 0.16544745  -5.894982 3.747219e-09 5.310345e-06
FBgn0024288    58.849488   -1.911765311 0.16493679  -5.563957 2.637244e-08 3.270182e-05

How to interpret the last two genes(FBgn0085359, FBgn0024288) here? I mean, they both had significant FDR, but their |log2 FC| is not above my threshold of 2. So are they still significant DE genes?

#3.
I've studied the paper and the vignette, from them, I get the idea that log2 FC shrinkage is to overcome the low count-high variance issue, and is more preferable for ranking and visualization. So in a formal report(eg. publication), we should report the shrunken log2 FC, not the original log2 FC, is this correct?

As always, thank you so much for your time and explanation, Michael!


Alan

R version 3.4.3 (2017-11-30), DESeq2_1.18.1 

deseq2 • 1.6k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 4 days ago
United States

hi Alan,

These are good questions. I only just added lfcThreshold to lfcShrink in the development branch (v1.19), so it won't be out in the release until April. It will work with type="normal" and type="apeglm". In version 1.18 it is not used by lfcShrink().

The reason those two have smaller |LFC| than 2 is because type="normal" is a bit too aggressive about shrinkage. Those genes likely do have |LFC| > 2. Note that, type="normal" is still good for ranking genes by effect size, better than the MLE LFCs or LFCs computed with pseudocounts according to our testing. [MLE = maximum likelihood estimate, this is the one being produced by DESeq() now, before running lfcShrink().] The newer method, type="apeglm", seems better all around, good for ranking genes by effect size and accurate at producing the LFC point estimate, and we are working on a manuscript to describe the new method and show its performance. Also you can use type="ashr" which also will not overly shrink the larger LFCs.

You can really choose what you like for the report. I'd recommend the two newer shrinkage methods (apeglm or ashr), as they are accurate for the large LFCs and have much less error on the imprecise LFCs than the MLE.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thank you very much indeed! A few follow up questions on this same LFC shrinkage issue: (I decide to go with apeglm after some reading)
#1.
Just to clarify: again, take the pasilla dataset as an example, if I wanna use the apeglm method to shrink LFC, and I am interested only in genes that show |log2 FC| >2, plus controling for FDR <0.05, is this the correct way to code?
dds <- DESeq(dds)
res <- results(dds, alpha = 0.05, lfcThreshold = 2, altHypothesis="greaterAbs")
resApe <- lfcShrink(dds, coef=2, res=res, type="apeglm")

#2. If I go with the apeglm method to shrink LFC, then it's not possible to get genes with FDR < 0.05 that have |LFC| <2, right? (at least for the above code with pasilla dataset, all FDR < 0.05 genes have |LFC| >2...)

#3. When using the lfcShrink() function, if I wanna keep the FDR throeshold of 0.05, then I need to include the res = res argument, right? (because I noted that if I do not include this argument, then the summary(resApe) would still show "adjusted p-value < 0.1")

Thank you for your advice and help!

Alan

ADD REPLY
1
Entering edit mode

I'll try to write it again:

1) yes, but in 1.20 you can use lfcThreshold in lfcShrink, and it will compute statistics directly on the posterior distribution for the LFC.

2) in theory it could happen, but practically i think it won't, because apeglm preserves large LFCs. And following the answer in (1), it would never happen if you computed statistics on the posterior LFC.

3) Yes you should pass res=res for customizing the results call, because if you don't lfcShrink uses default arguments to results internally.

ADD REPLY
0
Entering edit mode

Thank you very much, Michael! I am very sorry for making you write again. I used another computer at first, and so I posted again. Thank you for your patience and explanation!   -Alan

ADD REPLY
0
Entering edit mode

Did you receive my answer? It looks like it got deleted.

ADD REPLY

Login before adding your answer.

Traffic: 573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6