Hi Michael and community,
My question this time is about how to set up the Wald test log2 fold change above a threshold, and how to interpret the result correctly.
#1.
Take the pasilla dataset (in the DESeq2 vignette) as an example, if I am interested only in genes that show |log2 FC| >2, and I want to control for FDR <0.05, then I write:
dds <- DESeq(dds)
res <- results(dds, alpha = 0.05, lfcThreshold = 2, altHypothesis="greaterAbs")
resLFC <- lfcShrink(dds, coef = 2, res=res, lfcThreshold=2)
Is above code correct? In addition, do I need to put the lfcThreshold=2 argument in both results() step, and the lfcShrink() step?
#2.
I tried it on my computer, and the first few lines of the "resLFC" object (in the order of smallest FDR) look like below:
baseMean log2FoldChange lfcSE stat pvalue padj
FBgn0039155 730.567673 -3.686017982 0.12624706 -15.483906 4.455761e-54 4.420115e-50
FBgn0039827 261.911170 -2.841617881 0.14698260 -9.296996 1.444700e-20 7.165711e-17
FBgn0003360 4342.832062 -2.662356807 0.12047274 -8.215957 2.104796e-16 6.959859e-13
FBgn0025111 1501.447930 2.528934547 0.11059930 7.066198 1.592366e-12 3.893356e-09
FBgn0034736 225.870672 -2.492829764 0.14465009 -7.037131 1.962377e-12 3.893356e-09
FBgn0034434 114.623344 -2.227108812 0.15598919 -5.900835 3.616673e-09 5.310345e-06
FBgn0085359 68.606108 -1.856196452 0.16544745 -5.894982 3.747219e-09 5.310345e-06
FBgn0024288 58.849488 -1.911765311 0.16493679 -5.563957 2.637244e-08 3.270182e-05
How to interpret the last two genes(FBgn0085359, FBgn0024288) here? I mean, they both had significant FDR, but their |log2 FC| is not above my threshold of 2. So are they still significant DE genes?
#3.
I've studied the paper and the vignette, from them, I get the idea that log2 FC shrinkage is to overcome the low count-high variance issue, and is more preferable for ranking and visualization. So in a formal report(eg. publication), we should report the shrunken log2 FC, not the original log2 FC, is this correct?
As always, thank you so much for your time and explanation, Michael!
Alan
R version 3.4.3 (2017-11-30), DESeq2_1.18.1
Hi Michael,
Thank you very much indeed! A few follow up questions on this same LFC shrinkage issue: (I decide to go with apeglm after some reading)
#1.
Just to clarify: again, take the pasilla dataset as an example, if I wanna use the apeglm method to shrink LFC, and I am interested only in genes that show |log2 FC| >2, plus controling for FDR <0.05, is this the correct way to code?
dds <- DESeq(dds)
res <- results(dds, alpha = 0.05, lfcThreshold = 2, altHypothesis="greaterAbs")
resApe <- lfcShrink(dds, coef=2, res=res, type="apeglm")
#2. If I go with the apeglm method to shrink LFC, then it's not possible to get genes with FDR < 0.05 that have |LFC| <2, right? (at least for the above code with pasilla dataset, all FDR < 0.05 genes have |LFC| >2...)
#3. When using the lfcShrink() function, if I wanna keep the FDR throeshold of 0.05, then I need to include the res = res argument, right? (because I noted that if I do not include this argument, then the summary(resApe) would still show "adjusted p-value < 0.1")
Thank you for your advice and help!
Alan
I'll try to write it again:
1) yes, but in 1.20 you can use lfcThreshold in lfcShrink, and it will compute statistics directly on the posterior distribution for the LFC.
2) in theory it could happen, but practically i think it won't, because apeglm preserves large LFCs. And following the answer in (1), it would never happen if you computed statistics on the posterior LFC.
3) Yes you should pass res=res for customizing the results call, because if you don't lfcShrink uses default arguments to results internally.
Thank you very much, Michael! I am very sorry for making you write again. I used another computer at first, and so I posted again. Thank you for your patience and explanation! -Alan
Did you receive my answer? It looks like it got deleted.