Dear list, dear guRus,
first of all, great thanks for all the wonderful packages !
When making code using BiocParallel that should allow some parallel computations on both Linux and Windows I noticed the following surprising behaviour (ultimately creating an error message):
Note, at this point I'm using Windows ! When setting/changing BPPARAM from MulticoreParam() to SnowParam() other functions previously declared may not be available any more. This happens only when a new function is declared within the bplapply command, finally an error message will appear.
In the end I'll switch BPPARAM according to the current platform detected as either MulticoreParam or to SnowParam, the rest of the code should remain the same.
So the workaround I see so far, consists in avoiding declaring new functions within bplapply() .
However, I thought sharing this (to me quite unexpected) behaviour might be useful on this list.
Any comments/hints ? Am I doing somthing wrong the way I'm calling SnowParam() ?
Best greetings,
Wolfgang Raffelsberger
## here an example to illustrate my observations on Windows library("BiocParallel") myFun1 <- function(x,val) val+sum(c(x,x^2,x^3)) testMu <- bplapply(1:3,myFun1,val=10,BPPARAM=MulticoreParam(workers=3)) # OK testSn <- bplapply(1:3,myFun1,val=10,BPPARAM=SnowParam(workers=3,type="SOCK")) # OK ## but testMu <- bplapply(1:3,function(v) myFun1(v,val=10),BPPARAM=MulticoreParam(workers=3)) # OK testSn <- bplapply(1:3,function(v) myFun1(v,val=10),BPPARAM=SnowParam(workers=3,type="SOCK")) # error ! ## output of traceback > traceback(testSn <- bplapply(1:3,function(v) myFun1(v,val=10),BPPARAM=SnowParam(workers=3,type="SOCK"))) Erreur : BiocParallel errors element index: 1, 2, 3 first error: impossible de trouver la fonction "myFun1" ## for completeness - output of sessionInfo > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C [5] LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocParallel_1.8.1 loaded via a namespace (and not attached): [1] snow_0.4-2 tools_3.3.2 parallel_3.3.2
I have been trying to understand and read several posts about sending data (objects, functions, whatever) to workers. And I just can't seem to get it. It seems to be that the way that it is explained always is just impenetrable .... I have read about environments etc. I have a situation where I have a function that uses parallel processing inside it. So obviously you want to pass data, arguments etc from the function call to the workers. I have ended up writing temporary files in the "main" part of the function (with a defined file name) that are loaded in by the workers, but surely this cannot be the optimal way...
start your own question and include a SIMPLE example of what you are trying to do -- the description above isn't enough to understand how to help.
excuse me ,I think I got the same error when using SnowParam,but when I using MulticoreParam that is OK,I have read all the solutions above,but my code is a little complicated,it used lapply twice,so I don't know how to change it into the example style,could help me?Thank you very much!
My code is:
result<-BiocParallel::bplapply(1:length(peakgroup.raw), function(peakgroup.num){
lapply(1:length(speclib), function(speclib.num){
PKtoDP(peaktable = peakgroup.raw[[peakgroup.num]],
peaktable.corrected = peakgroup.corrected[[peakgroup.num]],
scantime = scantime.ms1,
speclib.single = speclib[[speclib.num]],
scan.ms1 = scan.ms1,
scan.ms2 = scan.ms2,
ms1ppm = ms1ppm,
ms2ppm = ms2ppm,
peakgroup.num = peakgroup.num, # for plot
massrange.ms1 = massrange.ms1,
mcicutoff = cutoff,
windows = file.windows)
})
})
'Forking' (the approach used for parallelism with MulticoreParam()) is not supported on Windows, and the code is evaluated serially where all functions are known.
I guess you have a script that defines a function `foo()`, and another function `bar()` that uses `foo()`
Whereas this works with SerialParam() or (on Linux) MulticoreParam().
The 'reason' is because SnowParam() creates independent R processes where `foo` is not defined, whereas MulticoreParam() and SerialParam() are using the same R process. SnowParam() doesn't send the .GlobalEnv (the place where foo() is defined) to the workers, but it does send the body of the function where bplapply is used (and so on, up to the global environment) to the worker, so
works
Does that help?