I would like to call a function that parallelizes the computation and I'd like to use FORK clusters so that I don't have to export the variables I'm operating on to each cluster. I've made a minimal toy example. The following code works fine. All of the clusters can see the variable 'a':
library(BiocParallel) fxn = function(x) { mean(a[[x]]) } a = vector("list", 20) for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) } param = MulticoreParam(workers = 4, type = "FORK") bplapply(1:4, fxn, BPPARAM = param) rm(list = ls())
However, in my pipeline, I need to be able to call a function, create some variables and then parallelize the work. To my surprise, the forked clusters can't see the variable 'a', which is created right above the call to makeCluster. Does anyone have any insight on why the forked clusters can't see 'a'? And what can I do to make variables visible to the forked clusters?
library(BiocParallel) fxn = function(x) { mean(a[[x]]) } parfxn = function() { a = vector("list", 20) for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) } param = MulticoreParam(workers = 4, type = "FORK") bplapply(1:4, fxn, BPPARAM = param) } parfxn()
The error is:
Error: BiocParallel errors element index: 1, 2, 3, 4 first error: object 'a' not found
Thanks in advance.