BiocParallel with drake hangs
0
0
Entering edit mode
biomiha ▴ 20
@biomiha-11346
Last seen 6 months ago
UK/Cambridge

Has anyone tried using BiocParallel inside a drake plan? For some reason it hangs if I don't set lock_envir = FALSE telling me that BiocParallel is modifying the global environment somehow but I'd like to know which part that is, to be able to expose that before running make.

BiocParallel • 1.4k views
ADD COMMENT
0
Entering edit mode

can you provide more detail, e.g., a link to 'drake plan', what 'lock_envir =' does, and when you say 'modifying the global environment' are you referring to the R global environment or the operating system environment or, ..., and a simple reproducible example, e.g., bplapply(1:5, sqrt, BPPARAM = SnowParam()) that illustrates where the problem occurs?

ADD REPLY
0
Entering edit mode

Hi Martin,

Yes apologies for not providing a reprex. I'm using the {drake} package (https://cran.r-project.org/web/packages/drake/index.html)

The following hangs forever:

library(drake)
library(SingleCellExperiment)
library(scran)
# gen_graph is a function I've written to write out an igraph object from a SingleCellExperiment object
gen_graph <- function(sce, graph_out){
  if(!("logcounts" %in% names(assays(sce)))) logcounts(sce) <- assay(sce)
  param <- MulticoreParam()
  g <- buildSNNGraph(x = sce, BPPARAM = param)
  igraph::write_graph(graph = g, file = graph_out, format = "edgelist")
}
set.seed(42)
my_plan <- drake_plan(
  cell_type1 = matrix(data = rep(rnorm(100, mean = 10), 10), ncol = 10),
  cell_type2 = matrix(data = rep(rnorm(100, mean = 150), 10), ncol = 10),
  mock_sce = SingleCellExperiment(assays = list(logcounts = cbind(cell_type1, cell_type2))),
  igr = gen_graph(sce = mock_sce, graph_out = file_out("mock_graph.txt"))
)

drake::vis_drake_graph(my_plan)
drake::make(my_plan)

What works is if I change the gen_graph function to not include BPPARAM:

gen_graph <- function(sce, graph_out){
  if(!("logcounts" %in% names(assays(sce)))) logcounts(sce) <- assay(sce)
  g <- buildSNNGraph(x = sce)
  igraph::write_graph(graph = g, file = graph_out, format = "edgelist")
}

Or if I keep gen_graph as is and unlock the environment in drake::make, like so:

drake::vis_drake_graph(my_plan)
drake::make(my_plan, lock_envir = FALSE)
ADD REPLY
0
Entering edit mode

It would still help to simplify your example further, to direct calls to BiocParallel like the one provided in my first comment. It would also help to see whether the problem involves drake at all, or whether the process (in the simple example) hangs regardless. From your question it seems that this genuinely involves an interaction between BiocParallel and drake. One possibility is that BiocParallel choose a 'port' socket connection to communicate with manager and worker by generating a random number. This could change the random number stream. Solutions might be to provide a port as described in the 'Global Options' section of ?SnowParam or to start the cluster bpstart(param) outside the drake command.

ADD REPLY
0
Entering edit mode

Hi Martin,

Using your example, SnowParam() works, whereas MulticoreParam() hangs. I'm not sure I understand the difference well enough to grasp what's going on.

Thanks.

ADD REPLY
0
Entering edit mode

Can you help me to make a simple reproducible example? For instance

library(drake)
library(BiocParallel)

set.seed(42)

my_plan <- drake_plan(
    bplapply(1:5, sqrt, BPPARAM = MulticoreParam())
)

drake::make(my_plan)

results in

> drake::make(my_plan)
In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
target drake_target_1
fail drake_target_1
Error: target `drake_target_1` failed. Call `drake::diagnose(drake_target_1)` for details. Error message:
  $ operator is invalid for atomic vectors

If you help me to get a simple reproducible example using the packages I don't know anything about, I'll be able t help you answer questions about packages I do know about.

ADD REPLY
0
Entering edit mode

Hi Martin,

The exact code you pasted yourself :

library(drake)
library(BiocParallel)

set.seed(42)

my_plan <- drake_plan(
    bplapply(1:5, sqrt, BPPARAM = MulticoreParam())
)

drake::make(my_plan)

for me hangs (as in never ends) in my RStudio Server, which is why I unfortunately can't report an error message as there is none. I have to interrupt R manually. I can, however, see I get the same $ operator is invalid for atomic vectors error message on my Mac.

This does not hang and produces the predicted result (on both platforms):

library(drake)
library(BiocParallel)

set.seed(42)

my_plan <- drake_plan(
    bplapply(1:5, sqrt, BPPARAM = MulticoreParam())
)

drake::make(my_plan, lock_envir = FALSE)

with the warning message:

In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
target drake_target_1
target drake_target_1 warnings:
  'package:stats' may not be available when loading
  'package:stats' may not be available when loading

To loadd the results of the drake_plan, you need to loadd(target1)

> target1
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
[1] 1.732051

[[4]]
[1] 2

[[5]]
[1] 2.236068

I was hoping for anyone using drake to have noticed this issue before. I apologise.

ADD REPLY

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6