While running DESeq2 v1.24.0 in R kernel v3.6.1 under Jupyter Notebook (via JupyterHub v1.1.0) I've noticed that it forks by default, even with "parallel=FALSE" but the threads are doing nothing (waiting for some process to finish, - "epoll_wait(7,...") and the step below takes ~20 minutes. At the same time, using DESeq2 v1.30.0 in R kernel v4.0.3 in the same Jupyter Notebook, below step finishes in ~4 minutes. I can see it forking too but this time threads are actually doing something (no waiting).
Now, while playing with 'register' and 'OMP_NUM_THREADS' in the R code below I was able to somehow turn forking off (DESeq2 v1.24.0 in R kernel 3.6.1) and it now runs on a single cpu and takes ~11 mins on average to complete the same step. I can't reproduce the 20 minute result as I can't turn forking back on, but nothing else seems to have been updated on the system.
This is some erratic behavior in DESeq2 and I am wondering is there a way to turn forking on and off? Is it defined inside DESeq2 code, or somewhere else, like, for example, the biocparallel library. We are running DESeq2 in HPC cluster environment and would like to have some level of predictability, especially, when other users running their code on the same node with DESeq2 code.
Please help us troubleshoot this problem.
I am aware of this post: DESeq(): Forking of R Process even though parallel=FALSE
Code should be placed in three backticks as shown below
# include your problematic code here with any corresponding output
colData=readRDS( "data/metaGRP_noOut.rds")
cts=readRDS( "data/count_noOut.rds")
cts=round(cts,digit=0)
head(colData)
library("DESeq2")
ddscounts <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design=~ + class)
keep <- rowSums(counts(ddscounts)>2) >=3
ddscounts2 <- ddscounts[keep,]
#register(SerialParam())
register(MulticoreParam(1))
#numWorkers <- 1
Sys.setenv("OMP_NUM_THREADS" = 1)
#decounts <-DESeq(ddscounts2, parallel=FALSE, BPPARAM=MulticoreParam(numWorkers))
decounts <-DESeq2(ddscounts2, parallel=FALSE, BPPARAM=MulticoreParam(1))
#decounts <-DESeq(ddscounts2)
# please also include the results of running the following in an R session
sessionInfo( )
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.2 (Maipo)
Matrix products: default
BLAS: /sysapps/cluster/software/Anaconda2/5.3.0/lib/libblas.so.3.8.0
LAPACK: /sysapps/cluster/software/Anaconda2/5.3.0/lib/liblapack.so.3.8.0
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.24.0 SummarizedExperiment_1.14.1
[3] DelayedArray_0.10.0 BiocParallel_1.18.1
[5] matrixStats_0.55.0 Biobase_2.44.0
[7] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0
[9] IRanges_2.18.3 S4Vectors_0.22.1
[11] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 jsonlite_1.7.1 splines_3.6.1
[4] Formula_1.2-3 latticeExtra_0.6-28 blob_1.2.0
[7] GenomeInfoDbData_1.2.1 RSQLite_2.1.2 pillar_1.4.7
[10] backports_1.1.5 lattice_0.20-38 glue_1.4.2
[13] uuid_0.1-2 digest_0.6.21 RColorBrewer_1.1-2
[16] XVector_0.24.0 checkmate_1.9.4 colorspace_1.4-1
[19] htmltools_0.3.6 Matrix_1.2-17 XML_3.98-1.20
[22] pkgconfig_2.0.3 genefilter_1.66.0 zlibbioc_1.30.0
[25] purrr_0.3.2 xtable_1.8-4 scales_1.0.0
[28] htmlTable_1.13.2 tibble_3.0.4 annotate_1.62.0
[31] generics_0.1.0 ggplot2_3.3.2.9000 ellipsis_0.3.1
[34] repr_1.0.1 nnet_7.3-12 survival_3.2-7
[37] magrittr_2.0.1 crayon_1.3.4 memoise_1.1.0
[40] evaluate_0.14 foreign_0.8-72 tools_3.6.1
[43] data.table_1.12.4 lifecycle_0.2.0 stringr_1.4.0
[46] locfit_1.5-9.1 munsell_0.5.0 cluster_2.1.0
[49] AnnotationDbi_1.48.0 compiler_3.6.1 rlang_0.4.8
[52] grid_3.6.1 RCurl_1.95-4.12 pbdZMQ_0.3-3
[55] IRkernel_1.0.2 rstudioapi_0.13 htmlwidgets_1.3
[58] bitops_1.0-6 base64enc_0.1-3 gtable_0.3.0
[61] DBI_1.0.0 R6_2.4.0 gridExtra_2.3
[64] knitr_1.25 dplyr_1.0.2 bit_1.1-14
[67] Hmisc_4.2-0 stringi_1.4.6 IRdisplay_0.7.0
[70] Rcpp_1.0.2 geneplotter_1.62.0 vctrs_0.3.5
[73] rpart_4.1-15 acepack_1.4.1 tidyselect_1.1.0
[76] xfun_0.10
Thank you!
Perhaps the conda environment you are using has a multi-threaded BLAS (linear algebra) package, and you are seeing threads (rather than forks) during linear algebra computations?