Search
Question: BiocParallel and variable scope in child processes
0
gravatar for dan.gatti
12 months ago by
dan.gatti0
dan.gatti0 wrote:

I would like to call a function that parallelizes the computation and I'd like to use FORK clusters so that I don't have to export the variables I'm operating on to each cluster.  I've made a minimal toy example. The following code works fine. All of the clusters can see the variable 'a':

library(BiocParallel)
fxn = function(x) { mean(a[[x]]) }
a = vector("list", 20)
for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) }
param = MulticoreParam(workers = 4, type = "FORK")
bplapply(1:4, fxn, BPPARAM = param)
rm(list = ls())

However, in my pipeline, I need to be able to call a function, create some variables and then parallelize the work.  To my surprise, the forked clusters can't see the variable 'a', which is created right above the call to makeCluster.  Does anyone have any insight on why the forked clusters can't see 'a'? And what can I do to make variables visible to the forked clusters?

library(BiocParallel)
fxn = function(x) { mean(a[[x]]) }
parfxn = function() {
  a = vector("list", 20)
  for(i in 1:20) { a[[i]] = matrix(rnorm(100), 10, 10) }
  param = MulticoreParam(workers = 4, type = "FORK")
  bplapply(1:4, fxn, BPPARAM = param)
}
parfxn()

The error is:

Error: BiocParallel errors
  element index: 1, 2, 3, 4
  first error: object 'a' not found

Thanks in advance.

ADD COMMENTlink modified 12 months ago by Martin Morgan ♦♦ 20k • written 12 months ago by dan.gatti0
0
gravatar for Martin Morgan
12 months ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

The approach fails without parallel evaluation too

> fxn = function(x) mean(a[[1]])
> parfxn = function()  { a = list(1:10); fxn() }
> parfxn()
Error in mean(a[[1]]) : object 'a' not found

because fxn is trying to find variables in the environment in which it was defined, rather than the environment it was called. The better practice is to write functions that do not refer to variables outside their scope, a practice which is required anyway for Windows or cluster users

fxn = function(x, a) mean(a[[x]])
parfxn = function() {
    ...
    bplapply(1:4, fxn, a, BPPARAM=param)
}

One could also define fxn inside parfxn, but that makes reuse difficult.

ADD COMMENTlink written 12 months ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 279 users visited in the last hour