Question: bplapply with progressbar
0
gravatar for wt215
11 months ago by
wt2150
wt2150 wrote:

Hello,

I am replacing foreach with BiocParallel in my package. I wonder whether could I maintain the same setting of progress bar  as in foreach for bplapply. (The same problem as listed in https://github.com/Bioconductor/BiocParallel/issues/54 and https://stat.ethz.ch/pipermail/bioc-devel/2017-December/012572.html.

Firstly I created a simple example in R:

nrow=10000
ncol=500
matrixx=matrix(runif(nrow*ncol),nrow=nrow,ncol=ncol)

 

Using foreach with progressbar:

library(parallel)
library(doSNOW)
library(foreach)
cluster=makeCluster(5,type='SOCK')
registerDoSNOW(cluster)
getDoParWorkers()
iterations<-nrow
pb<-txtProgressBar(max=iterations,style =3)
progress<-function(n)setTxtProgressBar(pb,n)
opts<-list(progress=progress)
BB_parmat<-foreach(geneind=1:dim(matrixx)[1],.combine=c,.options.snow=opts)%dopar%{
  return(mean(matrixx[geneind,]))
}

close(pb)
stopCluster(cluster)

Using bplapply with progress bar (a potential problem is that the progressbar will show 0% for a long time, and then suddenly increases):

library(BiocParallel)
BPPARAM=SnowParam(workers=5,progressbar = TRUE,type='SOCK')
funnn<-function(geneind,matrixx){
  return(mean(matrixx[geneind,]))
}

suppressWarnings(temp_result<-bplapply(seq(1,dim(matrixx)[1]),funnn,matrixx,BPPARAM=BPPARAM))

 

I prefer the progress bar shown in the foreach  case: increase the bar by 1% per time, so that I can have a basic idea about the running time of the whole code. In the second case, the progress bar increases suddenly. 

My question is how could I achieve the same progress bar as shown in foreach case using bplapply? 

 

Thank you very much!

Best wishes,

Wenhao

 

bplapply progressbar • 370 views
ADD COMMENTlink modified 11 months ago by Martin Morgan ♦♦ 23k • written 11 months ago by wt2150
Answer: bplapply with progressbar
5
gravatar for Martin Morgan
11 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

The effect can be achieved by setting the number of tasks, e.g.,

BPPARAM=SnowParam(workers=5, tasks = 20, progressbar = TRUE,type='SOCK')

updates the progress bar 20 times.

The way bplapply works is that, by default, it splits the initial task list (in your case the sequence of row indexes) into equal components for each worker -- each worker gets 10000 / 5 = 2000 rows. These are sent to the workers, who report back when done. When each worker finishes, the progress bar advances. The progress bar advances in 5 steps, but since the workers all finish at about the same time it seems like the progress bar jumps to complete.

The effect of setting tasks = 20 is to divide the 10000 tasks into 10000 / 20 = 500 rows per task, to send 500 x 5 to the first five workers, and as each worker finishes the progress bar is updated and the next 500 tasks sent to the worker. The progress bar moves across the screen more smoothly, but actually the computation is less efficient (because there is more communication between the manager and workers) and takes longer. If most of the time is spent in computation anyway, then the extra cost of communication is small and the trade-off may be worth it.

Usually of course it is better to vectorize than to parallelize, so in the above trivial example simply rowMeans(matrixx).

(the comment on your question was from a spammer, and was deleted).

 

ADD COMMENTlink written 11 months ago by Martin Morgan ♦♦ 23k

Thank you Martin! Is it possible to allow bplapply for passing arguments to the function txtProgressBar? If so then I can specify 'max=10000', so that progress bar will be element based.

For this toy example, rowMeans definitely works better. i just used it for illustration.

By the way, BiocParallel is very good, thank you for your work!

 

ADD REPLYlink written 11 months ago by wt2150

Under the current scheme, it will not help to make the progress bar element based, because it would be reporting progress on the workers, where no one is looking!

The current implementation does not allow progress bar options to be set; you could open an issue (no promises for an update, though), at https://github.com/Bioconductor/BiocParallel .

ADD REPLYlink written 11 months ago by Martin Morgan ♦♦ 23k

Picking up on this answer, is it possible to have bpapply show a progress bar similar to pbapply when using SerialCoreParam?

ADD REPLYlink written 11 months ago by maltethodberg140
1

I'm not sure that I understand the question; this

> param = SerialParam(progress=TRUE)
> res = bplapply(1:10, function(i) Sys.sleep(1), BPPARAM=param)
  |======================================================================| 100%

works?

 

ADD REPLYlink written 11 months ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 300 users visited in the last hour