Question: BiocParallel NULL value passed as symbol address
0
19 months ago by
Norway/Oslo
ioannis.vardaxis30 wrote:

Hey,

I have written a simple function Rcpp package, the function is the following:

#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
SEXP Test(double &i){
double j=std::pow(i,2.0);
return Rcpp::wrap(j);
}

I can source the code and run the function Test.

But I want to do it in parallel using the BiocParallel::bplapply function like:

snow <- BiocParallel::SnowParam(workers = 4, type = 'SOCK', progressbar=FALSE)
BiocParallel::register(snow, default=TRUE)

BiocParallel::bplapply(X = as.list(seq_len(1000)), FUN=Test)

Then I get the following error:

Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: NULL value passed as symbol address

If on the other hand I register only one core (workers=1) the function runs successfully.

What may be causing the problem?

Thank you!

biocparallel rcpp • 435 views
modified 19 months ago by Martin Morgan ♦♦ 23k • written 19 months ago by ioannis.vardaxis30
0
19 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

'Snow' params are separate processes, so any memory allocated in the main thread is unknown to the worker thread. You need to source the code on the worker

bplapply(as.list(seq_len(1000), FUN = function(i) {
source("your.cpp")
Test(i)
})

if sourcing the your cpp file is costly, then follow the Rcpp recommendations and create a package; loading the package will be a 'no-op' except for the first time

bplapply(as.list(seq_len(1000), FUN = function(i) {
library("YourLIbrary")
Test(i)
})

Hey,

I did as you suggested. If I source the Rcpp code in the loop it works, but it takes a lot of time and I need to check if it is worth running it in parallel.

If on the other hand I place the Rcpp code in my pkg and try to run it this way I get the following error:

Error in .Call("_pkg_Testing1_Speed_fun_Rcpp", PACKAGE = "pkg",  :
"_pkg_Testing1_Speed_fun_Rcpp" not available for .Call() for package "pkg"

I guess my answer was misleading. The C library needs to be loaded on the worker, and that means the library in which it is defined needs to be loaded, so library("YourPackage") either way. Again, the cost is once per process for each bplapply() or further amortized by

register(SnowParam())
bpstart() bplapply(...) ... bplapply(...) ... bpstop()

But it could be that this is still expensive for the amount of work to be done in the loops. A final strategy would be to put the C++ code in a package without other dependencies (assuming that it doesn't make calls back into your current package), but at that point one would really want to know that the original code was written efficiently and the extra effort was worth while.