Problem
I posted this question. After a few tries and errors, I found that dream()
requires unrealistically large memory. For example, to analyze my 34,155 (genes) x 27,584 (samples) count matrix using the dream
pipeline on 4 forked threads, even 500GB RAM is not enough even though the count matrix is just 3.4GB. Based on my understanding of what dream()
does (I read this paper), the modeling procedure for each gene is completely independent of the other genes once the precision weights have been calculated by voomWithDreamWeights()
. Hence I think there should be no reason for this excessive memory requirement of dream()
.
Below is the 'dream
' part of my script.
dgelist = DGEList(counts = count_data,
samples = sample_metadata,
group = sample_metadata$sex,
genes = gene_metadata)
k = filterByExpr(dgelist)
dgelist = dgelist[k, , keep.lib.sizes = F]
dgelist = calcNormFactors(dgelist, method = "TMM")
form = ~ sex + tissue_cancer + batch + (1|case_id)
v = voomWithDreamWeights(dgelist, form, sample_metadata, BPPARAM = MulticoreParam(12))
dfit = dream(v, form, sample_metadata, computeResiduals = F, BPPARAM = MulticoreParam(4)) # this takes >500GB memory
Questions
So here are my questions.
- Is it correct that what
dream()
does for each gene is completely independent of other genes? - If I manually split the
v
object into 100 gene subsets (for example 300 genes per each split) and rundream()
for each subset separately, will the results be the same as if I ran it as a whole? - Is there any way to reduce the memory requirement of the above script?
Any advice would be very much appreciated!
Session Info
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 variancePartition_1.20.0 BiocParallel_1.24.1 forcats_0.5.1 stringr_1.4.0
[6] dplyr_1.0.5 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.1
[11] tidyverse_1.3.1 scales_1.1.1 rstudioapi_0.13 magrittr_2.0.1 ggpubr_0.4.0
[16] ggplot2_3.3.3 fst_0.9.4 edgeR_3.32.1 limma_3.46.0 e1071_1.7-6
[21] dtplyr_1.1.0 DESeq2_1.30.1 SummarizedExperiment_1.20.0 Biobase_2.50.0 MatrixGenerics_1.2.1
[26] matrixStats_0.58.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[31] BiocGenerics_0.36.1 data.table_1.14.0 cowplot_1.1.1
loaded via a namespace (and not attached):
[1] minqa_1.2.4 colorspace_2.0-0 ggsignif_0.6.1 ellipsis_0.3.1 class_7.3-18 rio_0.5.26
[7] colorRamps_2.3 XVector_0.30.0 fs_1.5.0 proxy_0.4-25 bit64_4.0.5 AnnotationDbi_1.52.0
[13] fansi_0.4.2 lubridate_1.7.10 xml2_1.3.2 codetools_0.2-18 splines_4.0.4 doParallel_1.0.16
[19] cachem_1.0.4 geneplotter_1.68.0 knitr_1.33 jsonlite_1.7.2 nloptr_1.2.2.2 pbkrtest_0.5.1
[25] broom_0.7.6 annotate_1.68.0 dbplyr_2.1.1 compiler_4.0.4 httr_1.4.2 backports_1.2.1
[31] assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0 cli_2.4.0 prettyunits_1.1.1 tools_4.0.4
[37] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4 reshape2_1.4.4 Rcpp_1.0.6 carData_3.0-4
[43] cellranger_1.1.0 vctrs_0.3.7 nlme_3.1-152 iterators_1.0.13 xfun_0.22 lme4_1.1-26
[49] openxlsx_4.2.3 rvest_1.0.0 lifecycle_1.0.0 gtools_3.8.2 statmod_1.4.35 rstatix_0.7.0
[55] XML_3.99-0.6 MASS_7.3-53.1 zlibbioc_1.36.0 hms_1.0.0 curl_4.3 memoise_2.0.0
[61] stringi_1.5.3 RSQLite_2.2.7 genefilter_1.72.1 foreach_1.5.1 caTools_1.18.2 boot_1.3-27
[67] zip_2.1.1 rlang_0.4.10 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-41 bit_4.0.4
[73] tidyselect_1.1.0 plyr_1.8.6 R6_2.5.0 gplots_3.1.1 generics_0.1.0 DelayedArray_0.16.3
[79] DBI_1.1.1 pillar_1.6.0 haven_2.4.1 foreign_0.8-81 withr_2.4.2 survival_3.2-10
[85] abind_1.4-5 RCurl_1.98-1.3 modelr_0.1.8 crayon_1.4.1 car_3.0-10 KernSmooth_2.23-18
[91] utf8_1.2.1 progress_1.2.2 locfit_1.5-9.4 grid_4.0.4 readxl_1.3.1 blob_1.2.1
[97] reprex_2.0.0 xtable_1.8-4 munsell_0.5.0