The parallel is need to be applied on a large local variable (read only) generated. I am using 'MulticoreParam' thinking of shared memory for efficiency. And the performance is very weird for me. I did some tests which further confused me. test1 works as expected, test4 is what I am aiming which initiated cores without using them all, but 2~3 cores. test2 and test3 provide some hint, but I cannot figure it out.
Any helps appreciated!
UPDATE: The following phenomenal didn't shows in R 3.6.1 or R 4.0.3 as the comment below.
require(BiocParallel)
register(MulticoreParam(20))
rN <- 30000
cN <- 5000
X <- matrix(rnorm(rN*cN),ncol=cN)
test1 <- function(){
ids <- sample(LETTERS[1:20],cN,replace=T)
message("parallel")
tmp <- bplapply(LETTERS[1:20], function(id) {#
y = X[, ids %in% id,drop=F]
return(apply(y, 1, sum, na.rm = TRUE)/sum(y))
})
return(tmp)
}
test2 <- function(){
X1 <- X
ids <- sample(LETTERS[1:20],cN,replace=T)
message("parallel")
tmp <- bplapply(LETTERS[1:20], function(id) {#
y = X[, ids %in% id,drop=F]
return(apply(y, 1, sum, na.rm = TRUE)/sum(y))
})
return(tmp)
}
test3 <- function(){
X1 <- X
rm(X1)
ids <- sample(LETTERS[1:20],cN,replace=T)
message("parallel")
tmp <- bplapply(LETTERS[1:20], function(id) {#
y = X[, ids %in% id,drop=F]
return(apply(y, 1, sum, na.rm = TRUE)/sum(y))
})
return(tmp)
}
test4 <- function(){
X1 <- X
ids <- sample(LETTERS[1:20],cN,replace=T)
message("parallel")
tmp <- bplapply(LETTERS[1:20], function(id) {#
y = X1[, ids %in% id,drop=F]
return(apply(y, 1, sum, na.rm = TRUE)/sum(y))
})
return(tmp)
}
message("test1")
print(system.time(res <- test1()))
message("test2")
print(system.time(res <- test2()))
message("test3")
print(system.time(res <- test3()))
message("test4")
print(system.time(res <- test4()))
And the output:
Loading required package: BiocParallel
test1
parallel
user system elapsed
0.064 0.066 0.603
test2
parallel
user system elapsed
6.302 12.067 18.534
test3
parallel
user system elapsed
0.052 0.059 0.549
test4
parallel
user system elapsed
5.608 13.019 19.130
And the session infor:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)
Matrix products: default
BLAS: /.../pkg/R/3.5.1/centos6/lib64/R/lib/libRblas.so
LAPACK: /.../pkg/R/3.5.1/centos6/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.16.6
loaded via a namespace (and not attached):
[1] compiler_3.5.1 parallel_3.5.1
Thank you for the reply! And yes, it is for sc/sn datasets. Using rowSums is faster than apply, thanks for the suggestion. I modified the original post to include the running time and sessionInfo. I also changed to cN to be 5000. The weird is test2 and test4 which cost a log time. The increasing of cN has exponential impact on running time for test2 and test4. However, seems it is not showing such on your system, which trigger me to test on higher version of R (3.6). And it did NOT have such weird thing, all tests cost similarly time and faster!