Question: DESeq2 using many cores despite parallel = FALSE
0
gravatar for symbiologist
12 days ago by
symbiologist0 wrote:

Hi all,

Trying to run DESeq2 and I'm getting many cores engaged despite setting parallel = FALSE and trying to set BiocParallel register(SerialParam(), default = TRUE).

Simply doing the following leads to 36 cores engaged, which has been problematic because I've needed the cores for other things:

library(DESeq2)
set <- readRDS('~/expressionset.rds')

dds <- DESeqDataSetFromMatrix(counts(set),
                              colData = pData(set),
                              design = ~condition) 

dds <- DESeq(dds, parallel = FALSE)

Registering the SerialParam() as the default put it at the top of the list, but did not make a difference when running DESeq.

register(SerialParam(), default = TRUE)
registered() 

$SerialParam
class: SerialParam
  bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA

$MulticoreParam
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

$SnowParam
class: SnowParam
  bpisup: FALSE; bpnworkers: 78; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: SOCK

I've also tried passing 1 worker to BiocParallel, but it ends up engaging more cores (about 60).

dds <- DESeq(dds, parallel = TRUE, BPPARAM = MulticoreParam(1))
estimating size factors
estimating dispersions
gene-wise dispersion estimates: 1 workers
mean-dispersion relationship
final dispersion estimates, fitting model and testing: 1 workers
-- replacing outliers and refitting for 307 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

Narrowing it down, it looks like the estimateDispersions function is where the core usage gets out of hand.

dds <- estimateSizeFactors(dds) # seems to be fine
dds <- estimateDispersions(dds) # engages many cores

Any thoughts on how to deal with this? Am I setting something incorrectly? The main issue is that I'm trying to run other things in parallel so I really need DESeq2 to run single-core for this analysis.

Thank you!

David

Session info

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.24.0               SummarizedExperiment_1.14.1 DelayedArray_0.10.0         BiocParallel_1.18.1         matrixStats_0.55.0         
 [6] Biobase_2.44.0              GenomicRanges_1.36.1        GenomeInfoDb_1.20.0         IRanges_2.18.3              S4Vectors_0.22.1           
[11] BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.6.1          Formula_1.2-3          assertthat_0.2.1       latticeExtra_0.6-28    blob_1.2.0            
 [7] GenomeInfoDbData_1.2.1 pillar_1.4.2           RSQLite_2.1.2          backports_1.1.5        lattice_0.20-38        glue_1.3.1            
[13] digest_0.6.21          RColorBrewer_1.1-2     XVector_0.24.0         checkmate_1.9.4        colorspace_1.4-1       htmltools_0.4.0       
[19] Matrix_1.2-17          XML_3.98-1.20          pkgconfig_2.0.3        genefilter_1.66.0      zlibbioc_1.30.0        purrr_0.3.2           
[25] xtable_1.8-4           scales_1.0.0           htmlTable_1.13.2       tibble_2.1.3           annotate_1.62.0        ggplot2_3.2.1         
[31] nnet_7.3-12            lazyeval_0.2.2         survival_2.44-1.1      magrittr_1.5           crayon_1.3.4           memoise_1.1.0         
[37] foreign_0.8-72         tools_3.6.1            data.table_1.12.4      stringr_1.4.0          locfit_1.5-9.1         munsell_0.5.0         
[43] cluster_2.1.0          AnnotationDbi_1.46.1   packrat_0.5.0          compiler_3.6.1         rlang_0.4.0            grid_3.6.1            
[49] RCurl_1.95-4.12        rstudioapi_0.10        htmlwidgets_1.5.1      bitops_1.0-6           base64enc_0.1-3        gtable_0.3.0          
[55] DBI_1.0.0              R6_2.4.0               gridExtra_2.3          knitr_1.25             dplyr_0.8.3            bit_1.1-14            
[61] zeallot_0.1.0          Hmisc_4.2-0            stringi_1.4.3          Rcpp_1.0.2             geneplotter_1.62.0     vctrs_0.2.0           
[67] rpart_4.1-15           acepack_1.4.1          tidyselect_0.2.5       xfun_0.10   
deseq2 parallel • 61 views
ADD COMMENTlink modified 12 days ago by Michael Love25k • written 12 days ago by symbiologist0
Answer: DESeq2 using many cores despite parallel = FALSE
0
gravatar for Michael Love
12 days ago by
Michael Love25k
United States
Michael Love25k wrote:

I don't think DESeq2 is doing anything that would be different than other R tasks would do. E.g. take a look at what happens when you multiply two large matrices.

There's no secret parallel code that gets engaged. The only way that BiocParallel::bplapply is used is if parallel=TRUE.

ADD COMMENTlink written 12 days ago by Michael Love25k

Hmm good idea, I tried that with the following code:

NCols <- 100000
NRows <- 100000
myMat1 <- matrix(runif(NCols*NRows), ncol=NCols)
myMat2 <- matrix(runif(NCols*NRows), ncol=NCols)

myMat1 * myMat2

And I'm only seeing a single core is engaged.

In any case, if what you're saying is correct, does that mean something about how my R is set up is the culprit? How would I go about tracking this down?

I've only run into this with DESeq2 thus far (specifically the functions estimateDispersions varianceStabilizingTransform) but nothing else. If there is another place that would be better to post this issue, please let me know!

Thank you!

David

ADD REPLYlink written 10 days ago by symbiologist0

I don’t have any great ideas here. But there’s nothing engaging the BiocParallel infrastructure unless parallel=TRUE.

ADD REPLYlink written 10 days ago by Michael Love25k

you could try trace(bplapply) (or maybe trace(BiocParallel::bplapply)) before running estimateSizeFactors(); you'll see a trace print every time bplapply() is evaluated. You could also add an expression to trace() to document how bplapply() is being called...

> trace(bplapply, quote(print(BPPARAM)))
Tracing function "bplapply" in environment <namespace:BiocParallel>
[1] "bplapply"
attr(,"package")
[1] "BiocParallel"
> res = bplapply(1:10, sqrt)     # use estimateSizeFactors() here...
Tracing bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM) on entry
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 10; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

If there's no output then definitely BiocParallel::bplapply() is not being called.

ADD REPLYlink written 10 days ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 185 users visited in the last hour