Question

ashr vs apeglm for interaction data

1

Entering edit mode

Ali Barry ▴ 40

@2f691b31

Last seen 11 months ago

United Kingdom

Hi,

Currently playing with shrinkage parameters for an interaction of sex on my RNA-seq data. I've moved forward initially with apeglm, based on Zhu et al., 2018 but am curious as to what criteria others use to justify apeglm vs ashr for this sort of analysis.

My current dataset is 160 samples, but I'm narrowing focus to 4M + 4F per experimental condition in it's smallest unit. As an example, I have an FDR < 0.05 for ~ 200 genes, but apeglm shrinks all my fold changes towards zero, and ashr gives LFCs between 1 and 28 for all 200 genes. Base means are fairly low, but the surface difference reads as 'no sex interaction' with apeglm vs 'some interaction' through ashr, with the list including a number of sex-linked genes.

dds <- DESeqDataSetFromMatrix(countData = toc, colData = colData, design = ~Time_Cond*Sex)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)

dds$Sex  <- relevel(dds$Sex,"F")
dds$Time_Cond  <- relevel(dds$Time_Cond,"4W_Cont")
dds <- nbinomWaldTest(dds, betaPrior=FALSE)

resultsNames(dds)
[1] "Intercept"                    "Time_Cond_3D_Cont_vs_4W_Cont" "Time_Cond_3D_Ipsi_vs_4W_Cont" "Time_Cond_4W_Ipsi_vs_4W_Cont"
[5] "Sex_M_vs_F"                   "Time_Cond3D_Cont.SexM"        "Time_Cond3D_Ipsi.SexM"        "Time_Cond4W_Ipsi.SexM"  

res <- results(dds, name="Time_Cond4W_Ipsi.SexM", cooksCutoff=TRUE, filterFun=ihw)

results_apeglm <- lfcShrink(dds, coef="Time_Cond4W_Ipsi.SexM", res=res, type="apeglm")
results_ashr   <- lfcShrink(dds, coef="Time_Cond4W_Ipsi.SexM", res=res, type="ashr")

### annotated DEGs (FDR < 0.05)
> head(results_apeglm_all)
$sig_res
log2 fold change (MAP): Time Cond4W Ipsi.SexM 
Wald test p-value: Time Cond4W Ipsi.SexM 
DataFrame with 212 rows and 6 columns
                    baseMean log2FoldChange      lfcSE      pvalue        padj      symbol
                   <numeric>      <numeric>  <numeric>   <numeric>   <numeric> <character>
ENSMUSG00000005237   19.8251   -2.55126e-06  0.0014427 2.60194e-04  0.02807446       Dnah2
ENSMUSG00000092384   14.3061    2.53761e-06  0.0014427 3.09149e-05  0.00175162          NA
ENSMUSG00000024056   11.9862   -2.18011e-06  0.0014427 2.46566e-06  0.00030016       Ndc80
ENSMUSG00000053749  115.3782    2.14070e-06  0.0014427 2.82711e-05  0.00993435      Gm9920
ENSMUSG00000043913   90.3787    1.94794e-06  0.0014427 3.46138e-05  0.00881880      Ccdc60

head(results_ashr_all)
$sig_res
log2 fold change (MMSE): Time Cond4W Ipsi.SexM 
Wald test p-value: Time Cond4W Ipsi.SexM 
DataFrame with 212 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      pvalue        padj      symbol
                   <numeric>      <numeric> <numeric>   <numeric>   <numeric> <character>
ENSMUSG00000115151   5.92225        29.0170   2.70542 1.08325e-27 4.01186e-24          NA
ENSMUSG00000040985   6.34427        28.8085   2.97286 4.68893e-23 1.39236e-19        Sun3
ENSMUSG00000104266   2.78937       -28.7404   3.06162 8.74850e-22 5.02753e-18          NA
ENSMUSG00000117896   6.09515       -28.7338   2.95905 3.89762e-23 1.00295e-19    AA387883
ENSMUSG00000110747  12.41885       -28.7184   2.35732 5.72117e-35 6.81046e-31          NA

lfcShrink DESeq2 • 4.0k views

ADD COMMENT • link 3.7 years ago • updated 3.6 years ago Ali Barry ▴ 40

score 0 · Answer 1 · 2022-05-20

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 hours ago

United States

Can you show the MA plot for these two analyses?

Also for a gene with a large LFC from ashr, can you show plotCounts?

ADD COMMENT • link 3.6 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Plots attached. Many of the large fold changes with ashr are driven by one sample per group, but a few are maybe a bit more convincing. I've included a spread of examples.

MA plots

apeglm (left), ashr (right)

apeglm shrinkage ashr shrinkage

gene1 gene3 gene4 gene5

ADD REPLY • link 3.6 years ago Ali Barry ▴ 40

1

Entering edit mode

I see why apeglm is conservative— the dispersion is very high within group.

An idea would be to use ashr but do some additional filtering so all genes must have 5 or more positive counts. Otherwise it’s hard to say what’s due to noise and what’s DE

ADD REPLY • link 3.6 years ago Michael Love 43k

0

Entering edit mode

Thanks, I'll trying playing a bit there. I have done some minor filtering already (requiring at least 5 counts in 10% of my samples, across all 160 samples), because I was running into convergence issues with my glm for a small number of genes (~10 genes). There's definitely space to filter within subgroups a bit more though.

ADD REPLY • link 3.6 years ago Ali Barry ▴ 40