Hey,
I have written a simple function Rcpp package, the function is the following:
#include <Rcpp.h> using namespace Rcpp; //[[Rcpp::export]] SEXP Test(double &i){ double j=std::pow(i,2.0); return Rcpp::wrap(j); }
I can source the code and run the function Test.
But I want to do it in parallel using the BiocParallel::bplapply
function like:
snow <- BiocParallel::SnowParam(workers = 4, type = 'SOCK', progressbar=FALSE) BiocParallel::register(snow, default=TRUE) BiocParallel::bplapply(X = as.list(seq_len(1000)), FUN=Test)
Then I get the following error:
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: NULL value passed as symbol address
If on the other hand I register only one core (workers=1) the function runs successfully.
What may be causing the problem?
Thank you!
Hey,
I did as you suggested. If I source the Rcpp code in the loop it works, but it takes a lot of time and I need to check if it is worth running it in parallel.
If on the other hand I place the Rcpp code in my pkg and try to run it this way I get the following error:
I guess my answer was misleading. The C library needs to be loaded on the worker, and that means the library in which it is defined needs to be loaded, so
library("YourPackage")
either way. Again, the cost is once per process for eachbplapply()
or further amortized byregister(SnowParam())
bpstart()
bplapply(...)
...
bplapply(...)
...
bpstop()
But it could be that this is still expensive for the amount of work to be done in the loops. A final strategy would be to put the C++ code in a package without other dependencies (assuming that it doesn't make calls back into your current package), but at that point one would really want to know that the original code was written efficiently and the extra effort was worth while.