Question: bplapply with progressbar
0
9 months ago by
wt2150
wt2150 wrote:

Hello,

I am replacing foreach with BiocParallel in my package. I wonder whether could I maintain the same setting of progress bar  as in foreach for bplapply. (The same problem as listed in https://github.com/Bioconductor/BiocParallel/issues/54 and https://stat.ethz.ch/pipermail/bioc-devel/2017-December/012572.html.

Firstly I created a simple example in R:

nrow=10000
ncol=500
matrixx=matrix(runif(nrow*ncol),nrow=nrow,ncol=ncol)

Using foreach with progressbar:

library(parallel)
library(doSNOW)
library(foreach)
cluster=makeCluster(5,type='SOCK')
registerDoSNOW(cluster)
getDoParWorkers()
iterations<-nrow
pb<-txtProgressBar(max=iterations,style =3)
progress<-function(n)setTxtProgressBar(pb,n)
opts<-list(progress=progress)
BB_parmat<-foreach(geneind=1:dim(matrixx)[1],.combine=c,.options.snow=opts)%dopar%{
return(mean(matrixx[geneind,]))
}

close(pb)
stopCluster(cluster)

Using bplapply with progress bar (a potential problem is that the progressbar will show 0% for a long time, and then suddenly increases):

library(BiocParallel)
BPPARAM=SnowParam(workers=5,progressbar = TRUE,type='SOCK')
funnn<-function(geneind,matrixx){
return(mean(matrixx[geneind,]))
}

suppressWarnings(temp_result<-bplapply(seq(1,dim(matrixx)[1]),funnn,matrixx,BPPARAM=BPPARAM))

I prefer the progress bar shown in the foreach  case: increase the bar by 1% per time, so that I can have a basic idea about the running time of the whole code. In the second case, the progress bar increases suddenly.

My question is how could I achieve the same progress bar as shown in foreach case using bplapply?

Thank you very much!

Best wishes,

Wenhao

bplapply progressbar • 320 views
modified 9 months ago by Martin Morgan ♦♦ 23k • written 9 months ago by wt2150
5
9 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

The effect can be achieved by setting the number of tasks, e.g.,

BPPARAM=SnowParam(workers=5, tasks = 20, progressbar = TRUE,type='SOCK')

updates the progress bar 20 times.

The way bplapply works is that, by default, it splits the initial task list (in your case the sequence of row indexes) into equal components for each worker -- each worker gets 10000 / 5 = 2000 rows. These are sent to the workers, who report back when done. When each worker finishes, the progress bar advances. The progress bar advances in 5 steps, but since the workers all finish at about the same time it seems like the progress bar jumps to complete.

The effect of setting tasks = 20 is to divide the 10000 tasks into 10000 / 20 = 500 rows per task, to send 500 x 5 to the first five workers, and as each worker finishes the progress bar is updated and the next 500 tasks sent to the worker. The progress bar moves across the screen more smoothly, but actually the computation is less efficient (because there is more communication between the manager and workers) and takes longer. If most of the time is spent in computation anyway, then the extra cost of communication is small and the trade-off may be worth it.

Usually of course it is better to vectorize than to parallelize, so in the above trivial example simply rowMeans(matrixx).

(the comment on your question was from a spammer, and was deleted).

Thank you Martin! Is it possible to allow bplapply for passing arguments to the function txtProgressBar? If so then I can specify 'max=10000', so that progress bar will be element based.

For this toy example, rowMeans definitely works better. i just used it for illustration.

By the way, BiocParallel is very good, thank you for your work!

Under the current scheme, it will not help to make the progress bar element based, because it would be reporting progress on the workers, where no one is looking!

The current implementation does not allow progress bar options to be set; you could open an issue (no promises for an update, though), at https://github.com/Bioconductor/BiocParallel .

Picking up on this answer, is it possible to have bpapply show a progress bar similar to pbapply when using SerialCoreParam?

1

I'm not sure that I understand the question; this

> param = SerialParam(progress=TRUE)
> res = bplapply(1:10, function(i) Sys.sleep(1), BPPARAM=param)
|======================================================================| 100%

works?