Hi all,
Trying to run DESeq2 and I'm getting many cores engaged despite setting parallel = FALSE and trying to set BiocParallel register(SerialParam(), default = TRUE).
Simply doing the following leads to 36 cores engaged, which has been problematic because I've needed the cores for other things:
library(DESeq2)
set <- readRDS('~/expressionset.rds')
dds <- DESeqDataSetFromMatrix(counts(set),
colData = pData(set),
design = ~condition)
dds <- DESeq(dds, parallel = FALSE)
Registering the SerialParam() as the default put it at the top of the list, but did not make a difference when running DESeq.
register(SerialParam(), default = TRUE)
registered()
$SerialParam
class: SerialParam
bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
$MulticoreParam
class: MulticoreParam
bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: FORK
$SnowParam
class: SnowParam
bpisup: FALSE; bpnworkers: 78; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: SOCK
I've also tried passing 1 worker to BiocParallel, but it ends up engaging more cores (about 60).
dds <- DESeq(dds, parallel = TRUE, BPPARAM = MulticoreParam(1))
estimating size factors
estimating dispersions
gene-wise dispersion estimates: 1 workers
mean-dispersion relationship
final dispersion estimates, fitting model and testing: 1 workers
-- replacing outliers and refitting for 307 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
Narrowing it down, it looks like the estimateDispersions function is where the core usage gets out of hand.
dds <- estimateSizeFactors(dds) # seems to be fine
dds <- estimateDispersions(dds) # engages many cores
Any thoughts on how to deal with this? Am I setting something incorrectly? The main issue is that I'm trying to run other things in parallel so I really need DESeq2 to run single-core for this analysis.
Thank you!
David
Session info
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.24.0 SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1 matrixStats_0.55.0
[6] Biobase_2.44.0 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.3 S4Vectors_0.22.1
[11] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.6.1 Formula_1.2-3 assertthat_0.2.1 latticeExtra_0.6-28 blob_1.2.0
[7] GenomeInfoDbData_1.2.1 pillar_1.4.2 RSQLite_2.1.2 backports_1.1.5 lattice_0.20-38 glue_1.3.1
[13] digest_0.6.21 RColorBrewer_1.1-2 XVector_0.24.0 checkmate_1.9.4 colorspace_1.4-1 htmltools_0.4.0
[19] Matrix_1.2-17 XML_3.98-1.20 pkgconfig_2.0.3 genefilter_1.66.0 zlibbioc_1.30.0 purrr_0.3.2
[25] xtable_1.8-4 scales_1.0.0 htmlTable_1.13.2 tibble_2.1.3 annotate_1.62.0 ggplot2_3.2.1
[31] nnet_7.3-12 lazyeval_0.2.2 survival_2.44-1.1 magrittr_1.5 crayon_1.3.4 memoise_1.1.0
[37] foreign_0.8-72 tools_3.6.1 data.table_1.12.4 stringr_1.4.0 locfit_1.5-9.1 munsell_0.5.0
[43] cluster_2.1.0 AnnotationDbi_1.46.1 packrat_0.5.0 compiler_3.6.1 rlang_0.4.0 grid_3.6.1
[49] RCurl_1.95-4.12 rstudioapi_0.10 htmlwidgets_1.5.1 bitops_1.0-6 base64enc_0.1-3 gtable_0.3.0
[55] DBI_1.0.0 R6_2.4.0 gridExtra_2.3 knitr_1.25 dplyr_0.8.3 bit_1.1-14
[61] zeallot_0.1.0 Hmisc_4.2-0 stringi_1.4.3 Rcpp_1.0.2 geneplotter_1.62.0 vctrs_0.2.0
[67] rpart_4.1-15 acepack_1.4.1 tidyselect_0.2.5 xfun_0.10
Hmm good idea, I tried that with the following code:
And I'm only seeing a single core is engaged.
In any case, if what you're saying is correct, does that mean something about how my R is set up is the culprit? How would I go about tracking this down?
I've only run into this with DESeq2 thus far (specifically the functions estimateDispersions varianceStabilizingTransform) but nothing else. If there is another place that would be better to post this issue, please let me know!
Thank you!
David
I don’t have any great ideas here. But there’s nothing engaging the BiocParallel infrastructure unless parallel=TRUE.
you could try
trace(bplapply)
(or maybetrace(BiocParallel::bplapply)
) before runningestimateSizeFactors()
; you'll see a trace print every timebplapply()
is evaluated. You could also add an expression totrace()
to document howbplapply()
is being called...If there's no output then definitely
BiocParallel::bplapply()
is not being called.