Question: DiffBind does not run in parallel
4 weeks ago
eggrandio0 wrote:


I am trying to run DiffBind using parallel execution but it does not detect multiple cores, although I can through:

> parallel::detectCores()
[1] 8

When I try to run DiffBind. Here's what I see with a test run:

> test = dba(sampleSheet = "TEST.xls")
wt_D_1 wt  D  1 bayes
wt_D_2 wt  D  2 bayes
> test.counts = dba.count(test, minOverlap=1)
Sample: 01_wt_D_1_BAM_MD_asBED.bed125 
Sample: 02_wt_D_2_BAM_MD_asBED.bed125 
Sample: Input_files/25_wt_D_INP_BAM_MD_asBED.bed125 
Warning message:
In dba.multicore.init(DBA$config) :
  Parallel execution unavailable: executing serially.

What should I do to run dba.count in parallel? It would reduce the analysis time a lot for me. If you need any more info or me to run any other diagnostic command, please ask.


PS: Here is my sessionInfo() output in case it helps

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_2.4.8             SummarizedExperiment_1.6.5 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
 [6] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7           BiocGenerics_0.22.1       

loaded via a namespace (and not attached):
 [1] edgeR_3.18.1             bit64_0.9-7              splines_3.4.0            gtools_3.5.0             assertthat_0.2.0        
 [6] latticeExtra_0.6-28      amap_0.8-14              RBGL_1.52.0              blob_1.1.0               GenomeInfoDbData_0.99.0 
[11] Rsamtools_1.28.0         ggrepel_0.7.0            Category_2.42.1          pillar_1.1.0             RSQLite_2.0             
[16] backports_1.1.2          lattice_0.20-35          glue_1.2.0               limma_3.32.10            digest_0.6.14           
[21] RColorBrewer_1.1-2       XVector_0.16.0           checkmate_1.8.5          colorspace_1.3-2         Matrix_1.2-12           
[26] plyr_1.8.4               GSEABase_1.38.2          XML_3.98-1.9             pkgconfig_2.0.1          pheatmap_1.0.8          
[31] ShortRead_1.34.2         biomaRt_2.32.1           genefilter_1.58.1        zlibbioc_1.22.0          xtable_1.8-2            
[36] GO.db_3.4.1              scales_0.5.0             brew_1.0-6               gdata_2.18.0             BiocParallel_1.10.1     
[41] tibble_1.4.1             annotate_1.54.0          ggplot2_2.2.1            GenomicFeatures_1.28.5   lazyeval_0.2.1          
[46] XLConnect_0.2-13         magrittr_1.5             survival_2.41-3          memoise_1.1.0            systemPipeR_1.10.2      
[51] gplots_3.0.1             hwriter_1.3.2            GOstats_2.42.0           graph_1.54.0             tools_3.4.0             
[56] data.table_1.10.4-3      BBmisc_1.11              sendmailR_1.2-1          munsell_0.4.3            locfit_1.5-9.1          
[61] bindrcpp_0.2             AnnotationDbi_1.38.2     Biostrings_2.44.2        compiler_3.4.0           caTools_1.17.1          
[66] rlang_0.1.6              grid_3.4.0               RCurl_1.95-4.10          rjson_0.2.15             AnnotationForge_1.18.2  
[71] base64enc_0.1-3          bitops_1.0-6             gtable_0.2.0             DBI_0.7                  R6_2.2.2                
[76] GenomicAlignments_1.12.2 dplyr_0.7.4              rtracklayer_1.36.6       bit_1.1-12               bindr_0.1               
[81] XLConnectJars_0.2-13     KernSmooth_2.23-15       rJava_0.9-9              stringi_1.1.6            BatchJobs_1.7           
[86] Rcpp_0.12.14

ADD COMMENTlink modified 4 weeks ago by Rory Stark2.3k • written 4 weeks ago by eggrandio0
4 weeks ago
Rory Stark
CRUK, Cambridge, UK
Rory Stark2.3k wrote:

DiffBind uses the "parallel" package (built in to R) to run parallel jobs. This package unfortunately does not support parallel execution on the Windows platform. You'd need to use Linux or Mac OS to run in parallel.


ADD COMMENTlink written 4 weeks ago by Rory Stark2.3k

Thanks for the quick reply !

I do not know if it's appropiate, but instead of starting a new thread I wanted to ask you a follow-up question.

I have a huge dataset (32 files) and when I run the dba.count command, sometimes, it skips some files and doesnt count the reads. I have run it several times, and everytime the files that get "skipped" are different. I do not know if I'm running out of memory or what could be the cause for this behavior. I have resorted to run it until I get all the files read, but it is very time consuming.

This is an example of the message I obtain after running the dba.count :

Warning messages:

1: In dba.multicore.init(DBA$config) :

  Parallel execution unavailable: executing serially.

2: In DGEList(counts, lib.size = libsize, group = groups, genes = as.character(1:nrow(counts))) :

  library size of zero detected

3: In max(abs(logR)) : no non-missing arguments to max; returning -Inf
ADD REPLYlink written 4 weeks ago by eggrandio0

Hmmn. Since you are running serially anyway, is may be best to set bParallel=FALSE and see if that is any better.

This is a tough one to debug remotely. One thing you could try if you get desperate is to set:

> debug(DiffBind:::pv.do_getCounts)

and then type "c" whenever it stops.

ADD REPLYlink written 4 weeks ago by Rory Stark2.3k

I tried it with bParallel=FALSE but I am still getting files skipped. Sometimes it's just one, sometimes it's 10 of them.

I am running the debug. What should I look for? Does it give an automated report at the end?

Alternatively, I could add a line so it stops after finding any library with size 0. At least that way I do not have to wait until it has processed all the files to know if it has read them.

Thanks a lot for your help !

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by eggrandio0
4 weeks ago
Rory Stark
CRUK, Cambridge, UK
Rory Stark2.3k wrote:

No report, I was just thinking it would stop and change the timing.



ADD COMMENTlink written 4 weeks ago by Rory Stark2.3k
