Question: DiffBind does not run in parallel
0
16 months ago by
eggrandio0 wrote:

Hi,

I am trying to run DiffBind using parallel execution but it does not detect multiple cores, although I can through:

> parallel::detectCores()
[1] 8

When I try to run DiffBind. Here's what I see with a test run:

> test = dba(sampleSheet = "TEST.xls")
wt_D_1 wt  D  1 bayes
wt_D_2 wt  D  2 bayes
> test.counts = dba.count(test, minOverlap=1)
Sample: 01_wt_D_1_BAM_MD_asBED.bed125
Sample: 02_wt_D_2_BAM_MD_asBED.bed125
Sample: Input_files/25_wt_D_INP_BAM_MD_asBED.bed125
Warning message:
In dba.multicore.init(DBA$config) : Parallel execution unavailable: executing serially. What should I do to run dba.count in parallel? It would reduce the analysis time a lot for me. If you need any more info or me to run any other diagnostic command, please ask. Thanks! PS: Here is my sessionInfo() output in case it helps > sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] DiffBind_2.4.8 SummarizedExperiment_1.6.5 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2 [6] GenomicRanges_1.28.6 GenomeInfoDb_1.12.3 IRanges_2.10.5 S4Vectors_0.14.7 BiocGenerics_0.22.1 loaded via a namespace (and not attached): [1] edgeR_3.18.1 bit64_0.9-7 splines_3.4.0 gtools_3.5.0 assertthat_0.2.0 [6] latticeExtra_0.6-28 amap_0.8-14 RBGL_1.52.0 blob_1.1.0 GenomeInfoDbData_0.99.0 [11] Rsamtools_1.28.0 ggrepel_0.7.0 Category_2.42.1 pillar_1.1.0 RSQLite_2.0 [16] backports_1.1.2 lattice_0.20-35 glue_1.2.0 limma_3.32.10 digest_0.6.14 [21] RColorBrewer_1.1-2 XVector_0.16.0 checkmate_1.8.5 colorspace_1.3-2 Matrix_1.2-12 [26] plyr_1.8.4 GSEABase_1.38.2 XML_3.98-1.9 pkgconfig_2.0.1 pheatmap_1.0.8 [31] ShortRead_1.34.2 biomaRt_2.32.1 genefilter_1.58.1 zlibbioc_1.22.0 xtable_1.8-2 [36] GO.db_3.4.1 scales_0.5.0 brew_1.0-6 gdata_2.18.0 BiocParallel_1.10.1 [41] tibble_1.4.1 annotate_1.54.0 ggplot2_2.2.1 GenomicFeatures_1.28.5 lazyeval_0.2.1 [46] XLConnect_0.2-13 magrittr_1.5 survival_2.41-3 memoise_1.1.0 systemPipeR_1.10.2 [51] gplots_3.0.1 hwriter_1.3.2 GOstats_2.42.0 graph_1.54.0 tools_3.4.0 [56] data.table_1.10.4-3 BBmisc_1.11 sendmailR_1.2-1 munsell_0.4.3 locfit_1.5-9.1 [61] bindrcpp_0.2 AnnotationDbi_1.38.2 Biostrings_2.44.2 compiler_3.4.0 caTools_1.17.1 [66] rlang_0.1.6 grid_3.4.0 RCurl_1.95-4.10 rjson_0.2.15 AnnotationForge_1.18.2 [71] base64enc_0.1-3 bitops_1.0-6 gtable_0.2.0 DBI_0.7 R6_2.2.2 [76] GenomicAlignments_1.12.2 dplyr_0.7.4 rtracklayer_1.36.6 bit_1.1-12 bindr_0.1 [81] XLConnectJars_0.2-13 KernSmooth_2.23-15 rJava_0.9-9 stringi_1.1.6 BatchJobs_1.7 [86] Rcpp_0.12.14 ADD COMMENTlink modified 16 months ago by Rory Stark2.8k • written 16 months ago by eggrandio0 Answer: DiffBind does not run in parallel 1 16 months ago by Rory Stark2.8k CRUK, Cambridge, UK Rory Stark2.8k wrote: DiffBind uses the "parallel" package (built in to R) to run parallel jobs. This package unfortunately does not support parallel execution on the Windows platform. You'd need to use Linux or Mac OS to run in parallel. -Rory ADD COMMENTlink written 16 months ago by Rory Stark2.8k Thanks for the quick reply ! I do not know if it's appropiate, but instead of starting a new thread I wanted to ask you a follow-up question. I have a huge dataset (32 files) and when I run the dba.count command, sometimes, it skips some files and doesnt count the reads. I have run it several times, and everytime the files that get "skipped" are different. I do not know if I'm running out of memory or what could be the cause for this behavior. I have resorted to run it until I get all the files read, but it is very time consuming. This is an example of the message I obtain after running the dba.count : Warning messages: 1: In dba.multicore.init(DBA$config) :

Parallel execution unavailable: executing serially.

2: In DGEList(counts, lib.size = libsize, group = groups, genes = as.character(1:nrow(counts))) :

library size of zero detected

3: In max(abs(logR)) : no non-missing arguments to max; returning -Inf

Hmmn. Since you are running serially anyway, is may be best to set bParallel=FALSE and see if that is any better.

This is a tough one to debug remotely. One thing you could try if you get desperate is to set:

> debug(DiffBind:::pv.do_getCounts)

and then type "c" whenever it stops.

I tried it with bParallel=FALSE but I am still getting files skipped. Sometimes it's just one, sometimes it's 10 of them.

I am running the debug. What should I look for? Does it give an automated report at the end?

Alternatively, I could add a line so it stops after finding any library with size 0. At least that way I do not have to wait until it has processed all the files to know if it has read them.

Thanks a lot for your help !

Answer: DiffBind does not run in parallel
0
16 months ago by
Rory Stark2.8k
CRUK, Cambridge, UK
Rory Stark2.8k wrote:

No report, I was just thinking it would stop and change the timing.

-R