Question: BiocParallel and variable scope in child processes
0
2.9 years ago by
dan.gatti0 wrote:

I would like to call a function that parallelizes the computation and I'd like to use FORK clusters so that I don't have to export the variables I'm operating on to each cluster.  I've made a minimal toy example. The following code works fine. All of the clusters can see the variable 'a':

library(BiocParallel)
fxn = function(x) { mean(a[[x]]) }
a = vector("list", 20)
for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) }
param = MulticoreParam(workers = 4, type = "FORK")
bplapply(1:4, fxn, BPPARAM = param)
rm(list = ls())

However, in my pipeline, I need to be able to call a function, create some variables and then parallelize the work.  To my surprise, the forked clusters can't see the variable 'a', which is created right above the call to makeCluster.  Does anyone have any insight on why the forked clusters can't see 'a'? And what can I do to make variables visible to the forked clusters?

library(BiocParallel)
fxn = function(x) { mean(a[[x]]) }
parfxn = function() {
a = vector("list", 20)
for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) }
param = MulticoreParam(workers = 4, type = "FORK")
bplapply(1:4, fxn, BPPARAM = param)
}
parfxn()

The error is:

Error: BiocParallel errors
element index: 1, 2, 3, 4
first error: object 'a' not found

modified 2.9 years ago by Martin Morgan ♦♦ 23k • written 2.9 years ago by dan.gatti0
Answer: BiocParallel and variable scope in child processes
0
2.9 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

The approach fails without parallel evaluation too

> fxn = function(x) mean(a[[1]])
> parfxn = function()  { a = list(1:10); fxn() }
> parfxn()
Error in mean(a[[1]]) : object 'a' not found

because fxn is trying to find variables in the environment in which it was defined, rather than the environment it was called. The better practice is to write functions that do not refer to variables outside their scope, a practice which is required anyway for Windows or cluster users

fxn = function(x, a) mean(a[[x]])
parfxn = function() {
...
bplapply(1:4, fxn, a, BPPARAM=param)
}

One could also define fxn inside parfxn, but that makes reuse difficult.