how to toggle b/w forking and no forking in DESeq2
1
0
Entering edit mode
@user-24353
Last seen 3.4 years ago

While running DESeq2 v1.24.0 in R kernel v3.6.1 under Jupyter Notebook (via JupyterHub v1.1.0) I've noticed that it forks by default, even with "parallel=FALSE" but the threads are doing nothing (waiting for some process to finish, - "epoll_wait(7,...") and the step below takes ~20 minutes. At the same time, using DESeq2 v1.30.0 in R kernel v4.0.3 in the same Jupyter Notebook, below step finishes in ~4 minutes. I can see it forking too but this time threads are actually doing something (no waiting).

Now, while playing with 'register' and 'OMP_NUM_THREADS' in the R code below I was able to somehow turn forking off (DESeq2 v1.24.0 in R kernel 3.6.1) and it now runs on a single cpu and takes ~11 mins on average to complete the same step. I can't reproduce the 20 minute result as I can't turn forking back on, but nothing else seems to have been updated on the system.

This is some erratic behavior in DESeq2 and I am wondering is there a way to turn forking on and off? Is it defined inside DESeq2 code, or somewhere else, like, for example, the biocparallel library. We are running DESeq2 in HPC cluster environment and would like to have some level of predictability, especially, when other users running their code on the same node with DESeq2 code.

Please help us troubleshoot this problem.

I am aware of this post: DESeq(): Forking of R Process even though parallel=FALSE

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 

colData=readRDS( "data/metaGRP_noOut.rds")
cts=readRDS( "data/count_noOut.rds")
cts=round(cts,digit=0)
head(colData)
library("DESeq2")

ddscounts <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design=~ + class)

keep <- rowSums(counts(ddscounts)>2) >=3
ddscounts2 <- ddscounts[keep,]
#register(SerialParam()) 
register(MulticoreParam(1)) 
#numWorkers <- 1 
Sys.setenv("OMP_NUM_THREADS" = 1) 
#decounts <-DESeq(ddscounts2, parallel=FALSE, BPPARAM=MulticoreParam(numWorkers)) 
decounts <-DESeq2(ddscounts2, parallel=FALSE, BPPARAM=MulticoreParam(1))
#decounts <-DESeq(ddscounts2)


# please also include the results of running the following in an R session 

sessionInfo( )

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.2 (Maipo)

Matrix products: default
BLAS:   /sysapps/cluster/software/Anaconda2/5.3.0/lib/libblas.so.3.8.0
LAPACK: /sysapps/cluster/software/Anaconda2/5.3.0/lib/liblapack.so.3.8.0

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DESeq2_1.24.0               SummarizedExperiment_1.14.1
 [3] DelayedArray_0.10.0         BiocParallel_1.18.1        
 [5] matrixStats_0.55.0          Biobase_2.44.0             
 [7] GenomicRanges_1.36.1        GenomeInfoDb_1.20.0        
 [9] IRanges_2.18.3              S4Vectors_0.22.1           
[11] BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            jsonlite_1.7.1         splines_3.6.1         
 [4] Formula_1.2-3          latticeExtra_0.6-28    blob_1.2.0            
 [7] GenomeInfoDbData_1.2.1 RSQLite_2.1.2          pillar_1.4.7          
[10] backports_1.1.5        lattice_0.20-38        glue_1.4.2            
[13] uuid_0.1-2             digest_0.6.21          RColorBrewer_1.1-2    
[16] XVector_0.24.0         checkmate_1.9.4        colorspace_1.4-1      
[19] htmltools_0.3.6        Matrix_1.2-17          XML_3.98-1.20         
[22] pkgconfig_2.0.3        genefilter_1.66.0      zlibbioc_1.30.0       
[25] purrr_0.3.2            xtable_1.8-4           scales_1.0.0          
[28] htmlTable_1.13.2       tibble_3.0.4           annotate_1.62.0       
[31] generics_0.1.0         ggplot2_3.3.2.9000     ellipsis_0.3.1        
[34] repr_1.0.1             nnet_7.3-12            survival_3.2-7        
[37] magrittr_2.0.1         crayon_1.3.4           memoise_1.1.0         
[40] evaluate_0.14          foreign_0.8-72         tools_3.6.1           
[43] data.table_1.12.4      lifecycle_0.2.0        stringr_1.4.0         
[46] locfit_1.5-9.1         munsell_0.5.0          cluster_2.1.0         
[49] AnnotationDbi_1.48.0   compiler_3.6.1         rlang_0.4.8           
[52] grid_3.6.1             RCurl_1.95-4.12        pbdZMQ_0.3-3          
[55] IRkernel_1.0.2         rstudioapi_0.13        htmlwidgets_1.3       
[58] bitops_1.0-6           base64enc_0.1-3        gtable_0.3.0          
[61] DBI_1.0.0              R6_2.4.0               gridExtra_2.3         
[64] knitr_1.25             dplyr_1.0.2            bit_1.1-14            
[67] Hmisc_4.2-0            stringi_1.4.6          IRdisplay_0.7.0       
[70] Rcpp_1.0.2             geneplotter_1.62.0     vctrs_0.3.5           
[73] rpart_4.1-15           acepack_1.4.1          tidyselect_1.1.0      
[76] xfun_0.10

Thank you!

DESeq2 • 957 views
ADD COMMENT
0
Entering edit mode

Perhaps the conda environment you are using has a multi-threaded BLAS (linear algebra) package, and you are seeing threads (rather than forks) during linear algebra computations?

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

If parallel=FALSE (the default) there is no code in DESeq2 that is doing anything to use additional cores.

Not sure how else to advise...

ADD COMMENT
0
Entering edit mode

But this is what I observe when running DEseq2 on 1 worker with both parallel=FALSE or parallel=TRUE:

enter image description here

And the 'MulticoreParam()' function shows:

class: MulticoreParam
  bpisup: FALSE; bpnworkers: 14; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

Any idea how to turn forking off?

ADD REPLY
0
Entering edit mode

See Martin's comment above. It's not something controllable within R, but likely at the level of your linear algebra packages such as BLAS.

ADD REPLY
0
Entering edit mode

Martin and Michael,

Thank you for your suggestions. That is very possible as I am using two different conda envs here for each R. How would I troubleshoot a multi-threaded BLAS and why would it suddenly stop running on multiple threads as in my R kernel 3.6.1 above?

Setting number of threads to '1' doesn't help:

Sys.setenv("OMP_NUM_THREADS"=1) #inside R code

export OMP_NUM_THREADS=1 #in conda env where R is installed

ADD REPLY
0
Entering edit mode

I think the first step is to figure out what BLAS you're using, maybe following https://stackoverflow.com/a/9668217/547331 but conda-ized; i don't know what magic is needed to do this in conda.

ADD REPLY

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6