Search
Question: BiocParallel::bplapply() performance issue
1
gravatar for peter.kharchenko
2.2 years ago by
peter.kharchenko30 wrote:

We had to switch to using bplapply() and in the later version of the package started to encounter serious performance issues. Here's an example using low number of cores (workers):

Here's evaluation of a simple function using lapply, mclapply and bplapply with one worker: 

> system.time(lapply(1:1e2,function(x) order(rnorm(n=1e3))))
   user  system elapsed
  0.016   0.000   0.016
> require(parallel)
Loading required package: parallel
> system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=1))
   user  system elapsed
  0.016   0.000   0.015
> require(BiocParallel)
Loading required package: BiocParallel
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.196   0.020   9.953

bplapply time surges (proportional to the number of elements in the list).

This is using BiocParallel_1.2.22 (full sessionInfo() below). 

The problem does not occur when using an older version of BiocParallel (BiocParallel_1.0.3) :

> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.016   0.004   0.023

Also, the runtime for the newer version (1.2.22) is somehow affected by loading of other libraries ... for instance, loading mgcv library somehow doubles the runtime of that simple command:

> library(BiocParallel)
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.204   0.020   9.272
> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-7. For overview type 'help("mgcv-package")'.
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed
  0.184   0.008  20.569

Unfortunately this effect grinds our package to a halt in some situations, so I would appreciate your input. 

Full sessionInfo() below:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] mgcv_1.8-7          nlme_3.1-122        BiocParallel_1.2.22

loaded via a namespace (and not attached):
[1] Matrix_1.2-2         futile.logger_1.4.1  lambda.r_1.1.7
[4] futile.options_1.0.0 grid_3.2.2           lattice_0.20-33

Best,

-peter.

ADD COMMENTlink modified 2.2 years ago by Martin Morgan ♦♦ 21k • written 2.2 years ago by peter.kharchenko30

Thanks, obviously the performance is not satisfactory; we will look in to this.

The current Bioconductor release is 3.2, where BiocParallel is at version 1.4.1 (this does not help the performance issue, but will be the version where updates are introduced). The "Upgrading installed Bioconductor packages" instructions may help get you to the current version.

ADD REPLYlink written 2.2 years ago by Martin Morgan ♦♦ 21k
0
gravatar for Martin Morgan
2.2 years ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:

This is fixed in BiocParallel 1.4.3

> system.time(lapply(1:1e2,function(x) order(rnorm(n=1e3))))
   user  system elapsed 
  0.015   0.000   0.015 
>   ##  user  system elapsed
>   ## 0.016   0.000   0.016
> require(parallel)
Loading required package: parallel
> ## Loading required package: parallel
> system.time(mclapply(1:1e2,function(x) order(rnorm(n=1e3)),mc.cores=1))
   user  system elapsed 
  0.016   0.000   0.016 
>   ##  user  system elapsed
>   ## 0.016   0.000   0.015
> 
> require(BiocParallel)
Loading required package: BiocParallel
> system.time({
+     res0 <- bplapply(1:1e2 , function(x) order(rnorm(n=1e3)),
+                      BPPARAM = MulticoreParam(workers = 1))
+ })
   user  system elapsed 
  0.023   0.000   0.022 
> 
> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-10. For overview type 'help("mgcv-package")'.
> system.time(bplapply(1:1e2 , function(x) order(rnorm(n=1e3)),
+                      BPPARAM = MulticoreParam(workers = 1)))
   user  system elapsed 
  0.022   0.000   0.022 
>   ##  user  system elapsed
>   ## 0.184   0.008  20.569

A work-around in previous versions may be to explicitly set the number of tasks equal to the number of workers.

> system.time({
+     res1 <- bplapply(1:1e2 , function(x) order(rnorm(n=1e3)),
+                      BPPARAM = MulticoreParam(workers = 1, tasks=1))
+ })
   user  system elapsed 
  0.020   0.000   0.021 

Thanks for the report.

ADD COMMENTlink written 2.2 years ago by Martin Morgan ♦♦ 21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 323 users visited in the last hour