Search
Question: BiocParallel: messages shown only when processes finished
1
15 months ago by
Johannes Rainer1.3k
Italy
Johannes Rainer1.3k wrote:

Hi,

I am not quite sure if this is intentional, but I realized that calls to message and cat within functions executed by bplapply are shown only after the bplapply call. I tried to use also flush.console() after each message call but that didn't work either.

library(BiocParallel)

myFun <- function(x) {
message("Element ", x, " start")
Sys.sleep(2)
message("Element ", x, " end")
}

register(SerialParam())
tmp <- bplapply(1:6, myFun)
Element 1 start
Element 1 end
Element 2 start
Element 2 end
Element 3 start
Element 3 end
Element 4 start
Element 4 end
Element 5 start
Element 5 end
Element 6 start
Element 6 end
## Works nicely

## Using MulticoreParam:
register(MulticoreParam(2))
tmp <- bplapply(1:6, myFun)
Element 4 start
Element 4 end
Element 5 start
Element 5 end
Element 6 start
Element 6 end
Element 1 start
Element 1 end
Element 2 start
Element 2 end
Element 3 start
Element 3 end

## Console output is shown after the bplapply call finished

Is there a way to enable the immediate output of message or cat calls?

Additionally, at least on my system, the progress bar is also not progressing, but shows 0% and after all is finished 50% and 100%.

thanks, jo

My sessionInfo:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.6.0/x86_64 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BiocParallel_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.0 parallel_3.4.0

modified 15 months ago by Martin Morgan ♦♦ 22k • written 15 months ago by Johannes Rainer1.3k
2
15 months ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

The model is that messages / progress are reported each time the 'worker' reports back to the 'manager'. By default, the overall 'job' assigned to bplapply is divided as evenly as possible into a list of n tasks, where n is the number of workers. Each task consists of some number of elements. The tasks are sent to the workers, the workers process their elements and return to the manager. The manager then reports progress.  For your example, the job of length 6 was divided into 2 tasks each with three elements. It looks like worker 2 with elements 4:6 finished first, then worker 1 with tasks 1:3.

If you want more regular updates (or individual tasks have very variable execution times) use more tasks.

> myFun = function(i) { Sys.sleep(runif(1) / 5); message(i) }

The following mimics default behavior. The job of length 10 is divided into 5 tasks (1:2, 3:4, 5:6, 7:8, 9:10) across five workers. The first 5 tasks (i.e., all tasks) are assigned to workers. Worker 5 with tasks 9:10 finishes first and the manager reports, then worker 1 with tasks 1:2, etc.

> xx = bplapply(1:10, myFun, BPPARAM=MulticoreParam(5, tasks=5))
9
10
1
2
5
6
3
4
7
8

With job length 10 divided into 10 tasks 1, 2, 3, ..., 10. Tasks 1:5 are sent to workers. Worker 5 completes first and is assigned task 6, then worker 3 is finished and assigned task 7, etc. Manager reports progress / messages / etc as each task completes.

> xx = bplapply(1:10, myFun, BPPARAM=MulticoreParam(5, tasks=10))
5
3
1
2
4
8
6
7
10
9

The progress bar behaves similarly

> myFun = function(i) { Sys.sleep(runif(1) / 5); i }
> xx = bplapply(1:100, myFun, BPPARAM=MulticoreParam(5, tasks=100, progressbar=TRUE))


It would be fun and doable to implement immediate updates on all tasks, but with some performance consequences.

Thanks Martin for this explanation!

Hi Martin,

I was wondering if there is a way to avoid that upfront division of the job i.e. have bplapply() work asynchronously (like bpiterate()) where the n workers receive only 1 element of the list at a time. Can this be controlled via the BPPARAM argument or should I use bpiterate() for that? Thanks!

H.

ADD REPLYlink written 15 months ago by Hervé Pagès ♦♦ 13k
2

If the tasks are in X with length(X), then *Param(tasks=length(X)) does this -- workers receive one element at a time. bpiterate() is intended for use when X cannot be computed / is expensive to compute ahead of time.

ADD REPLYlink written 15 months ago by Martin Morgan ♦♦ 22k

Excellent! Thanks.  H.

ADD REPLYlink written 15 months ago by Hervé Pagès ♦♦ 13k