Question

Unexpected MAplot after apeglm shrinkage in DESeq2

0

Entering edit mode

ATpoint ★ 4.1k

@atpoint-13662

Last seen 6 hours ago

Germany

Dear Michael,

I am analyzing an RNA-seq cohort of about 90 patients for differential expression between two disease subtypes. Low-level processing was done with Salmon/tximport followed by the standard workflow with DESeq2, using the disease subtype as the only covariate in the design. I was quiet surprised that in the MAplot with apeglm, there was an absence of the "typical" cloud of data points around LFC ~ 0, especially in comparison with the default method. Is this normal and expected?

I uploaded the Rdata object and a minimal script if that helps. If you need further details, please let me know.

best wishes,

Alexander

R version 3.5.1 (2018-07-02)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)

Matrix products: default

BLAS: foo/anaconda3/lib/R/lib/libRblas.so

LAPACK: foo/anaconda3/lib/R/lib/libRlapack.so

locale:

[1] en_US.UTF-8

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

 [1] apeglm_1.2.1                DESeq2_1.20.0               SummarizedExperiment_1.10.1 DelayedArray_0.6.6          BiocParallel_1.14.2        

 [6] matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.7        GenomeInfoDb_1.16.0         IRanges_2.14.12            

[11] S4Vectors_0.18.3            BiocGenerics_0.26.0         data.table_1.11.8           RevoUtils_11.0.1            RevoUtilsMath_11.0.0       

loaded via a namespace (and not attached):

 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3          assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1             GenomeInfoDbData_1.1.0

 [8] yaml_2.2.0             numDeriv_2016.8-1      pillar_1.3.0           RSQLite_2.1.1          backports_1.1.2        lattice_0.20-35        glue_1.3.0            

[15] bbmle_1.0.20           digest_0.6.18          RColorBrewer_1.1-2     XVector_0.20.0         checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6       

[22] Matrix_1.2-14          plyr_1.8.4             XML_3.98-1.16          pkgconfig_2.0.2        emdbook_1.3.10         genefilter_1.62.0      zlibbioc_1.26.0       

[29] purrr_0.2.5            xtable_1.8-3           scales_1.0.0           htmlTable_1.12         tibble_1.4.2           annotate_1.58.0        ggplot2_3.1.0         

[36] nnet_7.3-12            lazyeval_0.2.1         survival_2.42-6        magrittr_1.5           crayon_1.3.4           memoise_1.1.0          MASS_7.3-51           

[43] foreign_0.8-71         tools_3.5.1            stringr_1.3.1          locfit_1.5-9.1         munsell_0.5.0          cluster_2.0.7-1        AnnotationDbi_1.42.1  

[50] bindrcpp_0.2.2         compiler_3.5.1         rlang_0.3.0.1          grid_3.5.1             RCurl_1.95-4.11        rstudioapi_0.8         htmlwidgets_1.3       

[57] bitops_1.0-6           base64enc_0.1-3        gtable_0.2.0           DBI_1.0.0              R6_2.3.0               gridExtra_2.3          knitr_1.20            

[64] dplyr_0.7.7            bit_1.1-14             bindr_0.1.1            Hmisc_4.1-1            stringi_1.2.4          Rcpp_0.12.19           geneplotter_1.58.0    

[71] rpart_4.1-13           acepack_1.4.1          coda_0.19-2            tidyselect_0.2.5

apeglm deseq2 rna-seq • 2.1k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 41k • written 5.5 years ago by ATpoint ★ 4.1k

0

Entering edit mode

I'll take a look. Thanks. One difference is that estimated LFCs that are compatible with LFC=0 are moved more toward 0 by apeglm, where the Normal prior tends to move them only slightly. But I'll look to see if I see something about the code that is more informative here.

ADD REPLY • link 5.5 years ago Michael Love 41k

score 4 · Accepted Answer · 2018-10-25

4

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi Alexander,

I think because of the large sample size, you'll get better dispersion estimates, and then better shrinkage estimates, by subsetting off a number of the small count genes.

I got more reasonable shrinkage plots with the following code:

library(DESeq2)
library(apeglm)
load("bioc.RData")
dlbcl.dds$tmp.coo <- relevel(dlbcl.dds$tmp.coo, ref = "GCB")
dds <- dlbcl.dds
dds$condition <- dds$tmp.coo
design(dds) <- ~condition
table(dds$condition)

keep <- rowSums(counts(dds) >= 10) >= 5
table(keep)
dds <- dds[keep,]

vsd <- vst(dds)
plotPCA(vsd, "condition")

dds <- DESeq(dds, minRep=Inf) # ~4 min

res <- results(dds)
summary(res)
plotMA(res, ylim=c(-5,5))

res2 <- lfcShrink(dds, coef=3, type="normal")
res3 <- lfcShrink(dds, coef=3, type="apeglm")
res4 <- lfcShrink(dds, coef=3, type="ashr")

plotMA(res2, ylim=c(-5,5))
plotMA(res3, ylim=c(-5,5))
plotMA(res4, ylim=c(-5,5))

ADD COMMENT • link 5.5 years ago Michael Love 41k

0

Entering edit mode

Thank you. Just out of interest, because I encountered this problem also on a different and unrelated dataset with larger sample sizes, suggesting that this is a rather general issue when n becomes large. Do you plan to make changes in future versions of apeglm to compensate for this issue internally without the need for external filtering?

ADD REPLY • link 5.3 years ago ATpoint ★ 4.1k

0

Entering edit mode

No we probably won’t ever do minimal filtering inside the function but instead I prefer that any filtering if it is being used can be seen by looking at the script.

ADD REPLY • link 5.3 years ago Michael Love 41k