Question

apeglm -- influence of reference level on logFC and svalues

0

Entering edit mode

ATpoint ★ 4.1k

@atpoint-13662

Last seen 7 hours ago

Germany

Hi Mike,

I came across an issue with apeglm that you hopefully can clarify on. I hope it is not a super-naive error on my side due to lack of understanding of the method.

I ran DESeq2/apeglm on a dataset with 6 levels. In order to make the relevant contrasts accessable via resultsNames I releveled the group argument several times and re-ran nbinomWaldTest to eventually being able to feed the respective coefs to lfcShrink. The issue that I noted after playing with the data a bit was that the log2FoldChange estimates and svalues for the same contrast dramatically change depending on the reference level.

Here is a hopefully reproducible chunk of code based on this txi object:

library(DESeq2)
library(dplyr)

#/ Load that txi object from the cloud:
dds <- DESeqDataSetFromTximport(txi = txi,
                                colData = data.frame(group=gsub("_rep.", "", colnames(txi$counts)), 
                                                     stringsAsFactors = TRUE),
                                design = ~group)

dds <- DESeq2::DESeq(dds, parallel = TRUE)

#/ Version 1 with STHSC as reference level:
dds$group <- relevel(dds$group, ref="STHSC")
dds       <- nbinomWaldTest(dds)

coef1 <- which(resultsNames(dds)=="group_LTHSC_vs_STHSC")
res1  <- results(dds, name=resultsNames(dds)[coef1], lfc=.5) %>% 
           lfcShrink(dds=dds, res=., type="apeglm", coef=coef1, parallel=TRUE, svalue=TRUE) %>% 
           data.frame %>% na.omit

#/ Version 2 with LTHSC as reference level:
dds$group <- relevel(dds$group, ref="LTHSC")
dds       <- nbinomWaldTest(dds)

coef2 <- which(resultsNames(dds)=="group_STHSC_vs_LTHSC")
res2  <- results(dds, name=resultsNames(dds)[coef2], lfc=.5) %>% 
  lfcShrink(dds=dds, res=., type="apeglm", coef=coef2, parallel=TRUE, svalue=TRUE) %>% 
  data.frame %>% na.omit

The outputs are now as follows:

#/ coef1 so group_LTHSC_vs_STHSC:
> res1["Ighv1-82",]
         baseMean log2FoldChange   lfcSE       svalue
Ighv1-82 31.43666       4.538567 1.69079 0.0007843982

#/ and group_STHSC_vs_LTHSC
> res2["Ighv1-82",]
         baseMean log2FoldChange     lfcSE    svalue
Ighv1-82 31.43666     -0.1420145 0.3471301 0.1575346

#/ and the normalized counts:
> counts(dds,normalized=TRUE)["Ighv1-82",grep("LTHSC|STHSC", colnames(dds))]
LTHSC_rep1 LTHSC_rep2 LTHSC_rep3 STHSC_rep1 STHSC_rep2 STHSC_rep3 STHSC_rep5 
  2.373138 690.546337   0.000000   2.327069   0.000000  12.700701   4.699537

So the question is now whether this behaviour is expected? In my very naive understanding only the direction of the fold change should change, but this is obviously not the case. Can you clarify? sessionInfo here at this gist.

DESeq2 apeglm • 1.1k views

ADD COMMENT • link updated 3.2 years ago by Michael Love 41k • written 3.2 years ago by ATpoint ★ 4.1k

score 2 · Accepted Answer · 2021-02-03

Thanks for the reproducible example.

I wouldn't expect the shrinkage estimators to be identical when the number of levels is > 2. This is also the case with LASSO and ridge, when the number of levels is more than 2, then changing the reference levels modifies the extent of shrinkage.

The discrepancy is likely to be greatest when the counts are more sparse, that the change to the reference level is affecting the stability of the solution -- whether the LFC is closer to the MLE or shrunk to 0. You would probably eliminate the discrepant cases with a minimal filter (although I admit one point of the estimator was to help avoid using filters...):

keep <- rowSums(counts(dds) >= 10) >= 4

In general, I'd recommend to just stick to an ordering of groups if you can throughout an analysis, and then make comparisons down the ordering.