Search
Question: BiocParallel::bplapply() performance with MulticoreParam() is worse than mclapply()
1
gravatar for davide risso
5 months ago by
davide risso520
Weill Cornell Medicine
davide risso520 wrote:

Dear all,

I'm having some performance issues with BiocParallel::bplapply that I think are somewhat related to this old post:

BiocParallel::bplapply() performance issue

I have started a new post because I'm using a much newer version of BiocParallel here (1.11.2), but I will use the same example:

> library(parallel)
> library(BiocParallel)
> system.time(lapply(1:1e2,function(x) order(rnorm(n=1e3))))
   user  system elapsed
  0.020   0.001   0.022
> system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=1))
   user  system elapsed
  0.010   0.000   0.011
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.022   0.003   0.025

Although bplapply and mclapply have the same performance with one worker, if I increase the workers to 2, bplapply becomes much slower than mclapply. This is true independently of the number of `tasks`, and as in the linked post seems to be related to which packages are loaded. Going back to the old post example, I get:

> system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=2))
   user  system elapsed
  0.002   0.006   0.015
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 2)))
   user  system elapsed
  0.053   0.018   0.204
> library(SummarizedExperiment)
> library(matrixStats)
> library(magrittr)
> library(ggplot2)
> library(biomaRt)
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 2)))
   user  system elapsed
  0.047   0.014   1.005
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 2, tasks = 2)))
   user  system elapsed 
  0.005   0.006   0.964

Note that the packages that I attached here are those that I load in my vignette, where I first noticed the problem, but it appears that just loading SummarizedExperiment will cause the same issue.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] biomaRt_2.33.1             ggplot2_2.2.1             
 [3] magrittr_1.5               scRNAseq_1.3.0            
 [5] SummarizedExperiment_1.7.4 DelayedArray_0.3.6        
 [7] matrixStats_0.52.2         Biobase_2.37.2            
 [9] GenomicRanges_1.29.4       GenomeInfoDb_1.13.2       
[11] IRanges_2.11.3             S4Vectors_0.15.3          
[13] BiocGenerics_0.23.0        BiocParallel_1.11.2       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11            compiler_3.4.0          plyr_1.8.4             
 [4] XVector_0.17.0          prettyunits_1.0.2       bitops_1.0-6           
 [7] tools_3.4.0             zlibbioc_1.23.0         progress_1.1.2         
[10] digest_0.6.12           RSQLite_1.1-2           memoise_1.1.0          
[13] tibble_1.3.3            gtable_0.2.0            lattice_0.20-35        
[16] rlang_0.1.1             Matrix_1.2-10           DBI_0.6-1              
[19] GenomeInfoDbData_0.99.0 grid_3.4.0              R6_2.2.1               
[22] AnnotationDbi_1.39.0    XML_3.98-1.7            scales_0.4.1           
[25] assertthat_0.2.0        colorspace_1.3-2        RCurl_1.95-4.8         
[28] lazyeval_0.2.0          munsell_0.4.3 
ADD COMMENTlink modified 5 months ago • written 5 months ago by davide risso520
1

I'll work further on this, noting that the original problem was much more severe than the one reported here.

If you were using parallel evaluation multiple time, the cost of establishing the cluster can be minimized by opening it first, e.g., 

> register(bpstart(MulticoreParam(workers=2)))
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3))))
   user  system elapsed 
  0.004   0.000   0.110 

I don't really think that the use case implied by the test -- many very fast iterations -- is the right context for R-level parallel evaluation, just do the operation without the complexity of parallelization order(rnorm(n=1e3 * 1e2)). This is especially true for code in a package, where approximately 1/2 our users will be on Windows and using independent processes, along the lines of

> library(BiocParallel)
> register(SnowParam(2))
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3))))
   user  system elapsed 
  0.080   0.000   0.865

Windows users must necessarily pay the cost of starting separate processes. Also R-level code (casting no aspersions!) can often be written to run two or more orders of magnitude faster by using vectorization rather than iteration; in new package submissions my response when I see the use of parallel packages of any sort is to ask whether the code itself should be refactored, usually resulting in simpler, much faster, and more robust code. The usual steps are to 'hoist' constant sub-expressions out of loops, then hoist vectorizable sub-expressions out of the loop as pre-computed vectors.

When the granularity of the task is larger, then the overhead of parallel evaluation becomes unimportant.

 

ADD REPLYlink written 5 months ago by Martin Morgan ♦♦ 20k

Thanks Martin and Johannes! Both of your suggestions are appreciated!

I agree with Martin's point on vectorizing operations, but I came across this behavior and wanted to get your opinion on this.

ADD REPLYlink written 5 months ago by davide risso520

Doesn't really answer your question, but since I also experienced problems with MulticoreParam on macOS...

On mac i switched from MulticoreParam to DoparParam, i.e. I'm using the doParallel package for parallel processing. I had the feeling that multicore/MulticoreParam had a problem with the forks, thus I prefer pre-registering the number of processes before:

library(BiocParallel)
library(doParallel)
registerDoParallel(2)

## First using Multicore:
register(MulticoreParam(2))
system.time(bplapply(1:1e2 , function(x) order(rnorm(n=1e3))))
   user  system elapsed
  0.107   0.029   0.329

## Now with doPar:
register(DoparParam())
system.time(bplapply(1:1e2 , function(x) order(rnorm(n=1e3))))
   user  system elapsed
  0.040   0.020   0.041

 

My sessionInfo:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.7.0/x86_64 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] BiocParallel_1.10.1 doParallel_1.0.10   iterators_1.0.8    
[4] foreach_1.4.3      

loaded via a namespace (and not attached):
[1] compiler_3.4.0   tools_3.4.0      codetools_0.2-15

 

ADD REPLYlink written 5 months ago by Johannes Rainer1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour