I'm trying to run DESeq2 using the SLURM backend fronted by ClusterMQ registered as a doParallel backend, like so:
library(foreach)
library(clustermq)
library(DESeq2)
library(doParallel)
library(BiocParallel)
# CLUSTER_MQ and FOREACH working well for a toy example
load("/fsx/home/bhayete/Projects/Deseq2DoPar/DeSeq_ToyData.RData")
# USING CLUSTERMQ TO PARALELLIZE DESEQ------------------------NOT WORKING
TIMEOUT = 10000
NJOBS = 100
options(
clustermq.scheduler = "slurm",
clustermq.template = 'slurmMq.tmpl',
clustermq.data.warning=5000 #megabytes
)
register_dopar_cmq(n_jobs=NJOBS,
fail_on_error=FALSE,
verbose=TRUE,
log_worker=TRUE,
timeout = TIMEOUT, #how long to wait on MQ side
pkgs=c('BiocParallel', 'DESeq2'),
template=list(
timeout=TIMEOUT, #how long to wait on SLURM side
memory=5000,
cores=1,#how many cores to use (to throttle down memory usage),
partition = 'compute-spot',
r_path = file.path(R.home("bin"), "R")
)
)
dds <- DESeqDataSetFromMatrix(countData = Count_Filt, colData = Metadata, design = ~ CoarseCondition)
print(paste(getDoParWorkers(), "workers", sep = '_'))
doparam <- DoparParam()
# Define workers otherwise only 1 worker ill be used
doparam$workers <- NJOBS
register(doparam)
x = foreach(i=1:300) %dopar% sqrt(i)
x2 = bplapply(1:300, sqrt, BPPARAM = doparam, log_worker=TRUE)
The resulting output snippet is as follows. Note that x is calculated on the cluster correctly, while x2 doesn't run. It is as though some internals of the S4 object for DESeq2 are not correctly exported to the cluster. What does this error mean and has anyone been able to run bplapply over SLURM by this mechanism?
x = foreach(i=1:300) %dopar% sqrt(i) Submitting 100 worker jobs (ID: cmq7587) ... Running 300 calculations (1 objs/0 Mb common; 1 calls/chunk) ... Master: [2.1s 20.9% CPU]; Worker: [avg 72.7% CPU, max 284.7 Mb]
x2 = bplapply(1:300, sqrt, BPPARAM = doparam, log_worker=TRUE) Submitting 100 worker jobs (ID: cmq9511) ... Running 100 calculations (1 objs/0 Mb common; 1 calls/chunk) ... Master: [2.8s 9.7% CPU]; Worker: [avg 78.8% CPU, max 290.2 Mb]
Warning in summarize_result(job_result, n_errors, n_warnings, cond_msgs, : 100/100 jobs failed (0 warnings) (Error #1) could not find function ".bpworker_EXEC" (Error #10) could not find function ".bpworker_EXEC" (Error #100) could not find function ".bpworker_EXEC" (Error #11) could not find function ".bpworker_EXEC" (Error #12) could not find function ".bpworker_EXEC" (Error #13) could not find function ".bpworker_EXEC" (Error #14) could not find function ".bpworker_EXEC" (Error #15) could not find function ".bpworker_EXEC" (Error #17) could not find function ".bpworker_EXEC" (Error #19) could not find function ".bpworker_EXEC" (Error #2) could not find function ".bpworker_EXEC" (Error #21) could not find function ".bpworker_EXEC" (Error #3) could not find function ".bpworker_EXEC" (Error #4) could not find function ".bpworker_EXEC"
Registration allows me to use ClusterMQ as the backend for foreach/BiocParallel, linking to SLURM and unlocking far larger compute resources than possible with just one machine. I do use BPPARAM=DoparParam() in the process.