Unexpected MAplot after apeglm shrinkage in DESeq2
1
0
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 8 hours ago
Germany

Dear Michael,

I am analyzing an RNA-seq cohort of about 90 patients for differential expression between two disease subtypes. Low-level processing was done with Salmon/tximport followed by the standard workflow with DESeq2, using the disease subtype as the only covariate in the design. I was quiet surprised that in the MAplot with apeglm, there was an absence of the "typical" cloud of data points around LFC ~ 0, especially in comparison with the default method. Is this normal and expected?

 

 

I uploaded the Rdata object and a minimal script if that helps. If you need further details, please let me know.

best wishes,

Alexander

 

R version 3.5.1 (2018-07-02)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)

Matrix products: default

BLAS: foo/anaconda3/lib/R/lib/libRblas.so

LAPACK: foo/anaconda3/lib/R/lib/libRlapack.so

locale:

[1] en_US.UTF-8

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

 [1] apeglm_1.2.1                DESeq2_1.20.0               SummarizedExperiment_1.10.1 DelayedArray_0.6.6          BiocParallel_1.14.2        

 [6] matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.7        GenomeInfoDb_1.16.0         IRanges_2.14.12            

[11] S4Vectors_0.18.3            BiocGenerics_0.26.0         data.table_1.11.8           RevoUtils_11.0.1            RevoUtilsMath_11.0.0       

loaded via a namespace (and not attached):

 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3          assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1             GenomeInfoDbData_1.1.0

 [8] yaml_2.2.0             numDeriv_2016.8-1      pillar_1.3.0           RSQLite_2.1.1          backports_1.1.2        lattice_0.20-35        glue_1.3.0            

[15] bbmle_1.0.20           digest_0.6.18          RColorBrewer_1.1-2     XVector_0.20.0         checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6       

[22] Matrix_1.2-14          plyr_1.8.4             XML_3.98-1.16          pkgconfig_2.0.2        emdbook_1.3.10         genefilter_1.62.0      zlibbioc_1.26.0       

[29] purrr_0.2.5            xtable_1.8-3           scales_1.0.0           htmlTable_1.12         tibble_1.4.2           annotate_1.58.0        ggplot2_3.1.0         

[36] nnet_7.3-12            lazyeval_0.2.1         survival_2.42-6        magrittr_1.5           crayon_1.3.4           memoise_1.1.0          MASS_7.3-51           

[43] foreign_0.8-71         tools_3.5.1            stringr_1.3.1          locfit_1.5-9.1         munsell_0.5.0          cluster_2.0.7-1        AnnotationDbi_1.42.1  

[50] bindrcpp_0.2.2         compiler_3.5.1         rlang_0.3.0.1          grid_3.5.1             RCurl_1.95-4.11        rstudioapi_0.8         htmlwidgets_1.3       

[57] bitops_1.0-6           base64enc_0.1-3        gtable_0.2.0           DBI_1.0.0              R6_2.3.0               gridExtra_2.3          knitr_1.20            

[64] dplyr_0.7.7            bit_1.1-14             bindr_0.1.1            Hmisc_4.1-1            stringi_1.2.4          Rcpp_0.12.19           geneplotter_1.58.0    

[71] rpart_4.1-13           acepack_1.4.1          coda_0.19-2            tidyselect_0.2.5     
apeglm deseq2 rna-seq • 2.4k views
ADD COMMENT
0
Entering edit mode

I'll take a look. Thanks. One difference is that estimated LFCs that are compatible with LFC=0 are moved more toward 0 by apeglm, where the Normal prior tends to move them only slightly. But I'll look to see if I see something about the code that is more informative here.

ADD REPLY
4
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

hi Alexander,

I think because of the large sample size, you'll get better dispersion estimates, and then better shrinkage estimates, by subsetting off a number of the small count genes.

I got more reasonable shrinkage plots with the following code:

library(DESeq2)
library(apeglm)
load("bioc.RData")
dlbcl.dds$tmp.coo <- relevel(dlbcl.dds$tmp.coo, ref = "GCB")
dds <- dlbcl.dds
dds$condition <- dds$tmp.coo
design(dds) <- ~condition
table(dds$condition)

keep <- rowSums(counts(dds) >= 10) >= 5
table(keep)
dds <- dds[keep,]

vsd <- vst(dds)
plotPCA(vsd, "condition")

dds <- DESeq(dds, minRep=Inf) # ~4 min

res <- results(dds)
summary(res)
plotMA(res, ylim=c(-5,5))

res2 <- lfcShrink(dds, coef=3, type="normal")
res3 <- lfcShrink(dds, coef=3, type="apeglm")
res4 <- lfcShrink(dds, coef=3, type="ashr")

plotMA(res2, ylim=c(-5,5))
plotMA(res3, ylim=c(-5,5))
plotMA(res4, ylim=c(-5,5))
ADD COMMENT
0
Entering edit mode

Thank you. Just out of interest, because I encountered this problem also on a different and unrelated dataset with larger sample sizes, suggesting that this is a rather general issue when n becomes large. Do you plan to make changes in future versions of apeglm to compensate for this issue internally without the need for external filtering?

ADD REPLY
0
Entering edit mode

No we probably won’t ever do minimal filtering inside the function but instead I prefer that any filtering if it is being used can be seen by looking at the script.

ADD REPLY

Login before adding your answer.

Traffic: 675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6