DESeq2 using many cores despite parallel = FALSE
1
0
Entering edit mode
@symbiologist-22093
Last seen 4.0 years ago

Hi all,

Trying to run DESeq2 and I'm getting many cores engaged despite setting parallel = FALSE and trying to set BiocParallel register(SerialParam(), default = TRUE).

Simply doing the following leads to 36 cores engaged, which has been problematic because I've needed the cores for other things:

library(DESeq2)
set <- readRDS('~/expressionset.rds')

dds <- DESeqDataSetFromMatrix(counts(set),
                              colData = pData(set),
                              design = ~condition) 

dds <- DESeq(dds, parallel = FALSE)

Registering the SerialParam() as the default put it at the top of the list, but did not make a difference when running DESeq.

register(SerialParam(), default = TRUE)
registered() 

$SerialParam
class: SerialParam
  bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA

$MulticoreParam
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

$SnowParam
class: SnowParam
  bpisup: FALSE; bpnworkers: 78; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: SOCK

I've also tried passing 1 worker to BiocParallel, but it ends up engaging more cores (about 60).

dds <- DESeq(dds, parallel = TRUE, BPPARAM = MulticoreParam(1))
estimating size factors
estimating dispersions
gene-wise dispersion estimates: 1 workers
mean-dispersion relationship
final dispersion estimates, fitting model and testing: 1 workers
-- replacing outliers and refitting for 307 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

Narrowing it down, it looks like the estimateDispersions function is where the core usage gets out of hand.

dds <- estimateSizeFactors(dds) # seems to be fine
dds <- estimateDispersions(dds) # engages many cores

Any thoughts on how to deal with this? Am I setting something incorrectly? The main issue is that I'm trying to run other things in parallel so I really need DESeq2 to run single-core for this analysis.

Thank you!

David

Session info

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.24.0               SummarizedExperiment_1.14.1 DelayedArray_0.10.0         BiocParallel_1.18.1         matrixStats_0.55.0         
 [6] Biobase_2.44.0              GenomicRanges_1.36.1        GenomeInfoDb_1.20.0         IRanges_2.18.3              S4Vectors_0.22.1           
[11] BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.6.1          Formula_1.2-3          assertthat_0.2.1       latticeExtra_0.6-28    blob_1.2.0            
 [7] GenomeInfoDbData_1.2.1 pillar_1.4.2           RSQLite_2.1.2          backports_1.1.5        lattice_0.20-38        glue_1.3.1            
[13] digest_0.6.21          RColorBrewer_1.1-2     XVector_0.24.0         checkmate_1.9.4        colorspace_1.4-1       htmltools_0.4.0       
[19] Matrix_1.2-17          XML_3.98-1.20          pkgconfig_2.0.3        genefilter_1.66.0      zlibbioc_1.30.0        purrr_0.3.2           
[25] xtable_1.8-4           scales_1.0.0           htmlTable_1.13.2       tibble_2.1.3           annotate_1.62.0        ggplot2_3.2.1         
[31] nnet_7.3-12            lazyeval_0.2.2         survival_2.44-1.1      magrittr_1.5           crayon_1.3.4           memoise_1.1.0         
[37] foreign_0.8-72         tools_3.6.1            data.table_1.12.4      stringr_1.4.0          locfit_1.5-9.1         munsell_0.5.0         
[43] cluster_2.1.0          AnnotationDbi_1.46.1   packrat_0.5.0          compiler_3.6.1         rlang_0.4.0            grid_3.6.1            
[49] RCurl_1.95-4.12        rstudioapi_0.10        htmlwidgets_1.5.1      bitops_1.0-6           base64enc_0.1-3        gtable_0.3.0          
[55] DBI_1.0.0              R6_2.4.0               gridExtra_2.3          knitr_1.25             dplyr_0.8.3            bit_1.1-14            
[61] zeallot_0.1.0          Hmisc_4.2-0            stringi_1.4.3          Rcpp_1.0.2             geneplotter_1.62.0     vctrs_0.2.0           
[67] rpart_4.1-15           acepack_1.4.1          tidyselect_0.2.5       xfun_0.10   
deseq2 parallel • 1.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I don't think DESeq2 is doing anything that would be different than other R tasks would do. E.g. take a look at what happens when you multiply two large matrices.

There's no secret parallel code that gets engaged. The only way that BiocParallel::bplapply is used is if parallel=TRUE.

ADD COMMENT
0
Entering edit mode

Hmm good idea, I tried that with the following code:

NCols <- 100000
NRows <- 100000
myMat1 <- matrix(runif(NCols*NRows), ncol=NCols)
myMat2 <- matrix(runif(NCols*NRows), ncol=NCols)

myMat1 * myMat2

And I'm only seeing a single core is engaged.

In any case, if what you're saying is correct, does that mean something about how my R is set up is the culprit? How would I go about tracking this down?

I've only run into this with DESeq2 thus far (specifically the functions estimateDispersions varianceStabilizingTransform) but nothing else. If there is another place that would be better to post this issue, please let me know!

Thank you!

David

ADD REPLY
0
Entering edit mode

I don’t have any great ideas here. But there’s nothing engaging the BiocParallel infrastructure unless parallel=TRUE.

ADD REPLY
0
Entering edit mode

you could try trace(bplapply) (or maybe trace(BiocParallel::bplapply)) before running estimateSizeFactors(); you'll see a trace print every time bplapply() is evaluated. You could also add an expression to trace() to document how bplapply() is being called...

> trace(bplapply, quote(print(BPPARAM)))
Tracing function "bplapply" in environment <namespace:BiocParallel>
[1] "bplapply"
attr(,"package")
[1] "BiocParallel"
> res = bplapply(1:10, sqrt)     # use estimateSizeFactors() here...
Tracing bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM) on entry
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 10; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

If there's no output then definitely BiocParallel::bplapply() is not being called.

ADD REPLY

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6