ashr vs apeglm for interaction data
Entering edit mode
Ali Barry ▴ 30
Last seen 23 days ago
United Kingdom


Currently playing with shrinkage parameters for an interaction of sex on my RNA-seq data. I've moved forward initially with apeglm, based on Zhu et al., 2018 but am curious as to what criteria others use to justify apeglm vs ashr for this sort of analysis.

My current dataset is 160 samples, but I'm narrowing focus to 4M + 4F per experimental condition in it's smallest unit. As an example, I have an FDR < 0.05 for ~ 200 genes, but apeglm shrinks all my fold changes towards zero, and ashr gives LFCs between 1 and 28 for all 200 genes. Base means are fairly low, but the surface difference reads as 'no sex interaction' with apeglm vs 'some interaction' through ashr, with the list including a number of sex-linked genes.

dds <- DESeqDataSetFromMatrix(countData = toc, colData = colData, design = ~Time_Cond*Sex)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)

dds$Sex  <- relevel(dds$Sex,"F")
dds$Time_Cond  <- relevel(dds$Time_Cond,"4W_Cont")
dds <- nbinomWaldTest(dds, betaPrior=FALSE)

[1] "Intercept"                    "Time_Cond_3D_Cont_vs_4W_Cont" "Time_Cond_3D_Ipsi_vs_4W_Cont" "Time_Cond_4W_Ipsi_vs_4W_Cont"
[5] "Sex_M_vs_F"                   "Time_Cond3D_Cont.SexM"        "Time_Cond3D_Ipsi.SexM"        "Time_Cond4W_Ipsi.SexM"  

res <- results(dds, name="Time_Cond4W_Ipsi.SexM", cooksCutoff=TRUE, filterFun=ihw)

results_apeglm <- lfcShrink(dds, coef="Time_Cond4W_Ipsi.SexM", res=res, type="apeglm")
results_ashr   <- lfcShrink(dds, coef="Time_Cond4W_Ipsi.SexM", res=res, type="ashr")

### annotated DEGs (FDR < 0.05)
> head(results_apeglm_all)
log2 fold change (MAP): Time Cond4W Ipsi.SexM 
Wald test p-value: Time Cond4W Ipsi.SexM 
DataFrame with 212 rows and 6 columns
                    baseMean log2FoldChange      lfcSE      pvalue        padj      symbol
                   <numeric>      <numeric>  <numeric>   <numeric>   <numeric> <character>
ENSMUSG00000005237   19.8251   -2.55126e-06  0.0014427 2.60194e-04  0.02807446       Dnah2
ENSMUSG00000092384   14.3061    2.53761e-06  0.0014427 3.09149e-05  0.00175162          NA
ENSMUSG00000024056   11.9862   -2.18011e-06  0.0014427 2.46566e-06  0.00030016       Ndc80
ENSMUSG00000053749  115.3782    2.14070e-06  0.0014427 2.82711e-05  0.00993435      Gm9920
ENSMUSG00000043913   90.3787    1.94794e-06  0.0014427 3.46138e-05  0.00881880      Ccdc60

log2 fold change (MMSE): Time Cond4W Ipsi.SexM 
Wald test p-value: Time Cond4W Ipsi.SexM 
DataFrame with 212 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      pvalue        padj      symbol
                   <numeric>      <numeric> <numeric>   <numeric>   <numeric> <character>
ENSMUSG00000115151   5.92225        29.0170   2.70542 1.08325e-27 4.01186e-24          NA
ENSMUSG00000040985   6.34427        28.8085   2.97286 4.68893e-23 1.39236e-19        Sun3
ENSMUSG00000104266   2.78937       -28.7404   3.06162 8.74850e-22 5.02753e-18          NA
ENSMUSG00000117896   6.09515       -28.7338   2.95905 3.89762e-23 1.00295e-19    AA387883
ENSMUSG00000110747  12.41885       -28.7184   2.35732 5.72117e-35 6.81046e-31          NA
lfcShrink DESeq2 • 221 views
Entering edit mode
Last seen 45 minutes ago
United States

Can you show the MA plot for these two analyses?

Also for a gene with a large LFC from ashr, can you show plotCounts?

Entering edit mode

Hi Michael,

Plots attached. Many of the large fold changes with ashr are driven by one sample per group, but a few are maybe a bit more convincing. I've included a spread of examples.

MA plots

apeglm (left), ashr (right)

apeglm shrinkage ashr shrinkage

gene1 gene3 gene4 gene5

Entering edit mode

I see why apeglm is conservative— the dispersion is very high within group.

An idea would be to use ashr but do some additional filtering so all genes must have 5 or more positive counts. Otherwise it’s hard to say what’s due to noise and what’s DE

Entering edit mode

Thanks, I'll trying playing a bit there. I have done some minor filtering already (requiring at least 5 counts in 10% of my samples, across all 160 samples), because I was running into convergence issues with my glm for a small number of genes (~10 genes). There's definitely space to filter within subgroups a bit more though.


Login before adding your answer.

Traffic: 267 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6