Question

BPPARAM = bpparam() takes longer than using BPPARAM = SerialParam()

1

Entering edit mode

benjamin.phillips22 ▴ 10

@benjaminphillips22-10048

Last seen 8.0 years ago

I wrote out all the details on stack exchange.

http://stackoverflow.com/questions/36375667/biocparallel-bpparam-bpparam-takes-longer-than-using-bpparam-serialparam

And here it is again.

I'm running ubuntu 14.04 Processor Intel® Core™ i5-2410M CPU @ 2.30GHz × 4 OS type 64-bit

I just installed the BiocParallel package and it's not running as I predicted.

First I ran some code in sequential order using SerialParam() and recorded the times.

test1 <- function(){
pmt <- proc.time()
bplapply(1:1e6, sqrt, BPPARAM = SerialParam())
print(proc.time()-pmt)
}
# Times
# > source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed 
# 0.760   0.005   0.768 
# > source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed 
# 0.733   0.000   0.730

These makes sense.

Then I tried parallel cores by using bpparam()

test2 <- function(){
pmt <- proc.time()
bplapply(1:1e6, sqrt, BPPARAM = bpparam())
print(proc.time()-pmt)
}
# Times
# source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed 
# 1.083   0.082  26.079 
# > source('~/R/hello_world/biocParallel_test.R')
# user  system elapsed 
# 0.855   0.076  25.654

As you can see from the picture below, the user time isn't correct. The user time is actually the elapsed time. The other weird thing is; why is the elapsed time so high? More cores should mean the elapsed time is about the same and the user time much less, but that wasn't what I found. Am I using BiocParallel incorrectly?

Valid XHTML

Where's another image showing that two cores are indeed being used when I run the second bit of code.

Valid XHTML

bplapply • 5.4k views

ADD COMMENT • link updated 8.0 years ago by Martin Morgan 25k • written 8.0 years ago by benjamin.phillips22 ▴ 10

score 3 · Answer 1 · 2016-04-08

Sorry, I didn't see your question. Basically there's overhead to start and communicate between parallel processes, so the 'work' has to justify going parallel. This is my favorite example, sleeping serially (taking 5 seconds) versus in parallel across five workers (everyone sleeps for 1s, so best-case is that the parallel job takes 1s)

> library(BiocParallel)
> f = function(i) { Sys.sleep(1); i }
> system.time(bplapply(1:5, f, BPPARAM=SerialParam()))
   user  system elapsed 
  0.006   0.000   5.011 
> system.time(bplapply(1:5, f, BPPARAM=MulticoreParam(5)))
   user  system elapsed 
  0.032   0.008   1.257

So there is a fairly substantial cost. If there's more work per task, then less overhead and more gain from parallel processing, so

> F = function(i) { Sys.sleep(5); i }
> system.time(bplapply(1:5, F, BPPARAM=MulticoreParam(5)))
   user  system elapsed 
  0.017   0.005   5.247

A lot of the overhead is 'startup', which you can get a sense for (and eliminate) by starting the back-end first (available for some backends)

> p = bpstart(MulticoreParam(5))
> system.time(bplapply(1:5, f, BPPARAM=p))
   user  system elapsed 
  0.005   0.000   1.108

I think the way user time is implemented in R is that it is actually the user time of the 'manager', which is the time spent orchestrating the parallel computation. The elapsed time is the wall-clock time.

And finally, in R it would be natural to vectorize sqrt() rather than to iterate, so simply sqrt(1:1e6). If the function you're employing is vectorized but the elements of the vector take some effort to calculate (again, sqrt would be a poor example, because the calculation is already 'fast'), then one could

> system.time(bpvec(1:1e6, sqrt, BPPARAM=p))
   user  system elapsed 
  0.527   0.012   0.861

This splits X=1:1e6 into approximately equal parts (by default; it's possible to arrange for more tasks for a kind of load-balancing), sending each part to each worker. This is much more efficient than iterating in parallel, as comparison with your times above show.

If started manually, don't forget to stop the back-end

> bpstop(p)