Entering edit mode
                    I recently noticed that the bplapply seems work differently from the older version.
For example, if I want to do a number of lm
library(BiocParallel)
# simulate some data for 10000 regressions
N = 10000
nobs = 100
X = matrix( rnorm(nobs*3), ncol=3 )
Y = matrix(rnorm(N*nobs), nrow=N)
numCores = 4
# a function to loop through the regressions
foo <- function(i) {
    lm(Y[i,]~X)$coef[2]
}
# bpapply is slow, it takes over 6 seconds on my laptop (Mac book pro)
mParam = MulticoreParam(workers=numCores, progressbar=FALSE)
t0 = Sys.time()
beta = bplapply(1:N, foo, BPPARAM = mParam)
t1 = Sys.time()
difftime(t1, t0, unit="secs") 
# it takes 1 second for mcapply
library(doParallel)
t0 = Sys.time()
beta = mclapply(1:N, foo, mc.cores = numCores)
t1 = Sys.time()
difftime(t1, t0, unit="secs")
I'm not exactly sure what's going on, since bplapply worked perfectly fine before for this. I read the manual, and it seems bplapply now passes each row of Y to the workers, instead of dividing all data into a few chunks. It seems bpvec can achieve that, but it requires extra programming. Can someone provide a good solution? Or I should just switch to mcapply?
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.1
Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     
other attached packages:
[1] doParallel_1.0.16   iterators_1.0.13    foreach_1.5.1      
[4] BiocParallel_1.28.0
loaded via a namespace (and not attached):
[1] compiler_4.1.2   codetools_0.2-18

I trie bpvec function a little bit, but got a mysterious error
It tells me "Error: length(FUN(X)) not equal to length(X)". This is strange since my function "foo" returns a vector of the same length as the input.
I found that there is a _task_ parameter in _MulticoreParam_ that can evenly divide and distribute the jobs. However, it's not making it faster.
I found that using SnowParam with FORK works well:
So what's going on with MulticoreParam?
If I remember correctly, there was a change in how the environment of R was loaded in the different background jobs. You should probably read the BiocParallel NEWS, it might be related to:
But you might have more luck in getting an answer in the bioc-devel mailing list or the slack
This is a post from 13 months ago; sorry it slipped through at the time. Here are current timings for me
The default
tasksalready divides the work in the best way for this computation (and is the same strategy across each of the*Param).bpvec()is not being used correctly. In a simplified example, ifX = 1:10were distributed over two workers, the first worker might receive1:5and the second worker6:10.foo(1:5)is ok;length(foo(1:5)) == length(1:5).foo(6:10)is problematic because as writtenlength(foo(6:10))is 10 and not 5. An updated version isThis takes about 2.24 seconds when run with
MulticoreParam()--bpvec()would be even more useful if the implementation offoo()itself were 'vectorized', as e.g., in the trivial example in the help page?bpvecwhere the body issqrt(idx)rather than an iteration as here.This catches us up to the question -- why is MulticoreParam being slow? In some respects it is implemented differently from
mclapply()and so performance is not expected to be identical, but I'm surprised thatSnowParam(type = "FORK")is relatively fast; I will look into this...UPDATE The reason for the slowness is because
MulticoreParam()includes a 'garbage collection' parameterforce.GCset toTRUEby default. Setting this toFALSErecovers the performance ofmclapply().There is a (pretty hard to follow grammatically, and not mentioning performance trade-offs) mention in the NEWS file
and in the commit log. As mentioned in the commit, the motivation was to manage memory better in response to this pull request. I will rethink the approach and have opened an issue to track this.
Returning to the original task of extracting coefficients from repeated fitting the same model matrix, note that it is much more efficient to use
lm.fit()than to uselm(); I haveFor a 10x speedup WITHOUT parallel evaluation.
Thanks for the answers. I'll look into this.
For the lm example: I was not trying to do lm, just used that as a dummy example to try the parallel functions. Of course if I really want to do many lm I'll do
solve(t(X)%*%X) %*% (t(X)%*%Y), which will be 100 times faster.