Load packages when using BatchtoolsParam
1
0
Entering edit mode
@a9c5eb20
Last seen 13 months ago
United States

Hello,

I have the following example code:

library(BiocParallel) 
param <- BatchtoolsParam(workers=5, cluster="slurm", template=tmpl)
register(param) 
## do work 
FUN <- function(x, y) { library(pkg); # Works } 
xx <- bplapply(1:10, FUN)

FUN is an exported function in my package. FUN is working on independent workers so I have to load pkg inside FUN, but I remember library(pkg) is not allowed in the function body in rcmdcheck/BiocCheck. Where should I place library(pkg)?

Regards

BatchtoolsParam BiocParallel • 1.3k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

It sounds like FUN from yourpackage uses a function foo from pkg, relying on pkg to be on the search path, e.g., because your DESCRIPTION file has Depends: pkg. This is a fragile design e.g., because a different definition of foo by the user or by another package attached after pkg would be used instead of the intended function.

Instead, change the DESCRIPTION file to

Imports: pkg

and the NAMESPACE file to

importFrom(pkg, foo)

BiocParallel will then work without needing to attach pkg in FUN.

If FUN were provided by a user and not in a package, then it would be necessary to add the call library(pkg), or to fully resolve the reference to foo() as pkg::foo(); there is nothing wrong with calling library(pkg) in a function defined in this way.

ADD COMMENT
0
Entering edit mode

Let me describe my questions in more detail.

My package mypkg has the large function largeFun and small function smallFun. Since jobs are submitted to remote independent clusters, I need library(mypkg) or mypkg::smallFun(x, y) on each of the independent clusters. Otherwise smallFun is not found.

Do I have to use the double colon format mypkg::smallFun(x, y)? If I use library(mypkg), BiocCheck::BiocCheck raises warnings: The following files call library or require on mypkg. This is not necessary.

What is the right format to use library(mypkg) in this case?

largeFun <- function {
  # Request 5 remote clusters.
  param <- BatchtoolsParam(workers=5, cluster="slurm", template=tmpl); register(param) 
  FUN <- function(x, y) { library(mypkg); smallFun(x, y) } 
  # FUN works on each of the 5 independent clusters.
  xx <- bplapply(1:5, FUN) 
}
ADD REPLY
1
Entering edit mode

smallFun should be available on the worker automatically. To confirm this I created a test package

devtools::create("TestPackage")

then added a simple file R/funs.R

small <- function(x, y) {
    x + y
}

#' @export
large <- function(BPPARAM) {
    FUN <- function(x, y) small(x, y)
    bplapply(1:10, FUN, 1)
}

Then created the NAMESPACE and installed the package

devtools::document()
devtools::install()

And then in a new session I can

library(BiocParallel); library(TestPackage)
large(SnowParam(2))
large(BatchtoolsParam(2, "socket"))

I don't have access to a slurm cluster, but the underlying machinery is the same and i would expect large(BatchtoolsParam(2, "slurm")) to work, too.

The reason that I'm confident that this works is mentioned in the last paragraph of the 'Introduction to BiocParallel' vignette

In bplapply(), the environment of FUN (other than the global environment) is serialized to the workers. A consequence is that, when FUN is inside a package name space, other functions available in the name space are available to FUN on the workers.

This in turn is because an R function is actually the function + the environment in which it is defined. large is defined the package namespace (environemnt), and so the definition of FUN includes the other functions in environment, in this case small.

If you've tested this and it fails, then I suspect that you've mis-diagnosed the problem; perhaps you could share your repository and actual code to reproduce the problem (easily!). It could be that the slurm implementation of batchtools is actually different from other implementations (I would be surprised) so you might confirm that the problem you are having still occurs when using, e.g., SnowParam().

ADD REPLY
0
Entering edit mode

You are right: In bplapply(), the environment of FUN (other than the global environment) is serialized to the workers. I tested smallFun(x, y) instead of mypkg::smallFun(x, y) on SLURM. It works fine. Thanks for your explanation!

ADD REPLY

Login before adding your answer.

Traffic: 812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6