Question: BPPARAM = bpparam() takes longer than using BPPARAM = SerialParam()
1
3.5 years ago by
benjamin.phillips2210 wrote:

I wrote out all the details on stack exchange.

http://stackoverflow.com/questions/36375667/biocparallel-bpparam-bpparam-takes-longer-than-using-bpparam-serialparam

And here it is again.

I'm running ubuntu 14.04 Processor Intel® Core™ i5-2410M CPU @ 2.30GHz × 4 OS type 64-bit

R version 3.2.4 Revised (2016-03-16 r70336) -- "Very Secure Dishes" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

I just installed the BiocParallel package and it's not running as I predicted.

First I ran some code in sequential order using SerialParam() and recorded the times.

test1 <- function(){
pmt <- proc.time()
bplapply(1:1e6, sqrt, BPPARAM = SerialParam())
print(proc.time()-pmt)
}
# Times
# > source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed
# 0.760   0.005   0.768
# > source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed
# 0.733   0.000   0.730 

These makes sense.

Then I tried parallel cores by using bpparam()

test2 <- function(){
pmt <- proc.time()
bplapply(1:1e6, sqrt, BPPARAM = bpparam())
print(proc.time()-pmt)
}
# Times
# source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed
# 1.083   0.082  26.079
# > source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed
# 0.855   0.076  25.654 

As you can see from the picture below, the user time isn't correct. The user time is actually the elapsed time. The other weird thing is; why is the elapsed time so high? More cores should mean the elapsed time is about the same and the user time much less, but that wasn't what I found. Am I using BiocParallel incorrectly?

Where's another image showing that two cores are indeed being used when I run the second bit of code.

bplapply • 1.9k views
modified 3.5 years ago by Martin Morgan ♦♦ 23k • written 3.5 years ago by benjamin.phillips2210
Answer: BPPARAM = bpparam() takes longer than using BPPARAM = SerialParam()
3
3.5 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

Sorry, I didn't see your question. Basically there's overhead to start and communicate between parallel processes, so the 'work' has to justify going parallel. This is my favorite example, sleeping serially (taking 5 seconds) versus in parallel across five workers (everyone sleeps for 1s, so best-case is that the parallel job takes 1s)

> library(BiocParallel)
> f = function(i) { Sys.sleep(1); i }
> system.time(bplapply(1:5, f, BPPARAM=SerialParam()))
user  system elapsed
0.006   0.000   5.011
> system.time(bplapply(1:5, f, BPPARAM=MulticoreParam(5)))
user  system elapsed
0.032   0.008   1.257 

So there is a fairly substantial cost. If there's more work per task, then less overhead and more gain from parallel processing, so

> F = function(i) { Sys.sleep(5); i }
> system.time(bplapply(1:5, F, BPPARAM=MulticoreParam(5)))
user  system elapsed
0.017   0.005   5.247 

A lot of the overhead is 'startup', which you can get a sense for (and eliminate) by starting the back-end first (available for some backends)

> p = bpstart(MulticoreParam(5))
> system.time(bplapply(1:5, f, BPPARAM=p))
user  system elapsed
0.005   0.000   1.108


I think the way user time is implemented in R is that it is actually the user time of the 'manager', which is the time spent orchestrating the parallel computation. The elapsed time is the wall-clock time.

And finally, in R it would be natural to vectorize sqrt() rather than to iterate, so simply sqrt(1:1e6). If the function you're employing is vectorized but the elements of the vector take some effort to calculate (again, sqrt would be a poor example, because the calculation is already 'fast'), then one could

> system.time(bpvec(1:1e6, sqrt, BPPARAM=p))
user  system elapsed
0.527   0.012   0.861 

This splits X=1:1e6 into approximately equal parts (by default; it's possible to arrange for more tasks for a kind of load-balancing), sending each part to each worker. This is much more efficient than iterating in parallel, as comparison with your times above show.

If started manually, don't forget to stop the back-end

> bpstop(p)