BiocParallel: messages shown only when processes finished
1
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 7 weeks ago
Italy

Hi,

I am not quite sure if this is intentional, but I realized that calls to message and cat within functions executed by bplapply are shown only after the bplapply call. I tried to use also flush.console() after each message call but that didn't work either.

library(BiocParallel)

myFun <- function(x) {
    message("Element ", x, " start")
    Sys.sleep(2)
    message("Element ", x, " end")
}

register(SerialParam())
tmp <- bplapply(1:6, myFun)
Element 1 start
Element 1 end
Element 2 start
Element 2 end
Element 3 start
Element 3 end
Element 4 start
Element 4 end
Element 5 start
Element 5 end
Element 6 start
Element 6 end
## Works nicely

## Using MulticoreParam:
register(MulticoreParam(2))
tmp <- bplapply(1:6, myFun)
Element 4 start
Element 4 end
Element 5 start
Element 5 end
Element 6 start
Element 6 end
Element 1 start
Element 1 end
Element 2 start
Element 2 end
Element 3 start
Element 3 end

## Console output is shown after the bplapply call finished

Is there a way to enable the immediate output of message or cat calls?

Additionally, at least on my system, the progress bar is also not progressing, but shows 0% and after all is finished 50% and 100%.

 

thanks, jo

 

My sessionInfo:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.6.0/x86_64 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.0 parallel_3.4.0

 

biocparallel • 1.7k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States

The model is that messages / progress are reported each time the 'worker' reports back to the 'manager'. By default, the overall 'job' assigned to bplapply is divided as evenly as possible into a list of n tasks, where n is the number of workers. Each task consists of some number of elements. The tasks are sent to the workers, the workers process their elements and return to the manager. The manager then reports progress.  For your example, the job of length 6 was divided into 2 tasks each with three elements. It looks like worker 2 with elements 4:6 finished first, then worker 1 with tasks 1:3.

If you want more regular updates (or individual tasks have very variable execution times) use more tasks.

> myFun = function(i) { Sys.sleep(runif(1) / 5); message(i) }

The following mimics default behavior. The job of length 10 is divided into 5 tasks (1:2, 3:4, 5:6, 7:8, 9:10) across five workers. The first 5 tasks (i.e., all tasks) are assigned to workers. Worker 5 with tasks 9:10 finishes first and the manager reports, then worker 1 with tasks 1:2, etc.

> xx = bplapply(1:10, myFun, BPPARAM=MulticoreParam(5, tasks=5))
9
10
1
2
5
6
3
4
7
8

With job length 10 divided into 10 tasks 1, 2, 3, ..., 10. Tasks 1:5 are sent to workers. Worker 5 completes first and is assigned task 6, then worker 3 is finished and assigned task 7, etc. Manager reports progress / messages / etc as each task completes.

> xx = bplapply(1:10, myFun, BPPARAM=MulticoreParam(5, tasks=10))
5
3
1
2
4
8
6
7
10
9

The progress bar behaves similarly

> myFun = function(i) { Sys.sleep(runif(1) / 5); i }
> xx = bplapply(1:100, myFun, BPPARAM=MulticoreParam(5, tasks=100, progressbar=TRUE))

It would be fun and doable to implement immediate updates on all tasks, but with some performance consequences.

 

 

ADD COMMENT
0
Entering edit mode

Thanks Martin for this explanation!

ADD REPLY
0
Entering edit mode

Hi Martin,

I was wondering if there is a way to avoid that upfront division of the job i.e. have bplapply() work asynchronously (like bpiterate()) where the n workers receive only 1 element of the list at a time. Can this be controlled via the BPPARAM argument or should I use bpiterate() for that? Thanks!

H.

ADD REPLY
2
Entering edit mode

If the tasks are in X with length(X), then *Param(tasks=length(X)) does this -- workers receive one element at a time. bpiterate() is intended for use when X cannot be computed / is expensive to compute ahead of time.

ADD REPLY
0
Entering edit mode

Excellent! Thanks.  H.

ADD REPLY

Login before adding your answer.

Traffic: 394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6