DiffBind does not run in parallel
2
0
Entering edit mode
eggrandio • 0
@eggrandio-14403
Last seen 22 months ago
United States

Hi,

I am trying to run DiffBind using parallel execution but it does not detect multiple cores, although I can through:

> parallel::detectCores()
[1] 8

When I try to run DiffBind. Here's what I see with a test run:

> test = dba(sampleSheet = "TEST.xls")
wt_D_1 wt  D  1 bayes
wt_D_2 wt  D  2 bayes
> test.counts = dba.count(test, minOverlap=1)
Sample: 01_wt_D_1_BAM_MD_asBED.bed125 
Sample: 02_wt_D_2_BAM_MD_asBED.bed125 
Sample: Input_files/25_wt_D_INP_BAM_MD_asBED.bed125 
Warning message:
In dba.multicore.init(DBA$config) :
  Parallel execution unavailable: executing serially.

What should I do to run dba.count in parallel? It would reduce the analysis time a lot for me. If you need any more info or me to run any other diagnostic command, please ask.

Thanks!

PS: Here is my sessionInfo() output in case it helps

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_2.4.8             SummarizedExperiment_1.6.5 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
 [6] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7           BiocGenerics_0.22.1       

loaded via a namespace (and not attached):
 [1] edgeR_3.18.1             bit64_0.9-7              splines_3.4.0            gtools_3.5.0             assertthat_0.2.0        
 [6] latticeExtra_0.6-28      amap_0.8-14              RBGL_1.52.0              blob_1.1.0               GenomeInfoDbData_0.99.0 
[11] Rsamtools_1.28.0         ggrepel_0.7.0            Category_2.42.1          pillar_1.1.0             RSQLite_2.0             
[16] backports_1.1.2          lattice_0.20-35          glue_1.2.0               limma_3.32.10            digest_0.6.14           
[21] RColorBrewer_1.1-2       XVector_0.16.0           checkmate_1.8.5          colorspace_1.3-2         Matrix_1.2-12           
[26] plyr_1.8.4               GSEABase_1.38.2          XML_3.98-1.9             pkgconfig_2.0.1          pheatmap_1.0.8          
[31] ShortRead_1.34.2         biomaRt_2.32.1           genefilter_1.58.1        zlibbioc_1.22.0          xtable_1.8-2            
[36] GO.db_3.4.1              scales_0.5.0             brew_1.0-6               gdata_2.18.0             BiocParallel_1.10.1     
[41] tibble_1.4.1             annotate_1.54.0          ggplot2_2.2.1            GenomicFeatures_1.28.5   lazyeval_0.2.1          
[46] XLConnect_0.2-13         magrittr_1.5             survival_2.41-3          memoise_1.1.0            systemPipeR_1.10.2      
[51] gplots_3.0.1             hwriter_1.3.2            GOstats_2.42.0           graph_1.54.0             tools_3.4.0             
[56] data.table_1.10.4-3      BBmisc_1.11              sendmailR_1.2-1          munsell_0.4.3            locfit_1.5-9.1          
[61] bindrcpp_0.2             AnnotationDbi_1.38.2     Biostrings_2.44.2        compiler_3.4.0           caTools_1.17.1          
[66] rlang_0.1.6              grid_3.4.0               RCurl_1.95-4.10          rjson_0.2.15             AnnotationForge_1.18.2  
[71] base64enc_0.1-3          bitops_1.0-6             gtable_0.2.0             DBI_0.7                  R6_2.2.2                
[76] GenomicAlignments_1.12.2 dplyr_0.7.4              rtracklayer_1.36.6       bit_1.1-12               bindr_0.1               
[81] XLConnectJars_0.2-13     KernSmooth_2.23-15       rJava_0.9-9              stringi_1.1.6            BatchJobs_1.7           
[86] Rcpp_0.12.14

diffbind parallel biocparallel • 2.4k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 16 days ago
Cambridge, UK

DiffBind uses the "parallel" package (built in to R) to run parallel jobs. This package unfortunately does not support parallel execution on the Windows platform. You'd need to use Linux or Mac OS to run in parallel.

-Rory

ADD COMMENT
0
Entering edit mode

Thanks for the quick reply !

I do not know if it's appropiate, but instead of starting a new thread I wanted to ask you a follow-up question.

I have a huge dataset (32 files) and when I run the dba.count command, sometimes, it skips some files and doesnt count the reads. I have run it several times, and everytime the files that get "skipped" are different. I do not know if I'm running out of memory or what could be the cause for this behavior. I have resorted to run it until I get all the files read, but it is very time consuming.

This is an example of the message I obtain after running the dba.count :
 

Warning messages:

1: In dba.multicore.init(DBA$config) :

  Parallel execution unavailable: executing serially.

2: In DGEList(counts, lib.size = libsize, group = groups, genes = as.character(1:nrow(counts))) :

  library size of zero detected

3: In max(abs(logR)) : no non-missing arguments to max; returning -Inf
ADD REPLY
0
Entering edit mode

Hmmn. Since you are running serially anyway, is may be best to set bParallel=FALSE and see if that is any better.

This is a tough one to debug remotely. One thing you could try if you get desperate is to set:

> debug(DiffBind:::pv.do_getCounts)

and then type "c" whenever it stops.

ADD REPLY
0
Entering edit mode

I tried it with bParallel=FALSE but I am still getting files skipped. Sometimes it's just one, sometimes it's 10 of them.

I am running the debug. What should I look for? Does it give an automated report at the end?

Alternatively, I could add a line so it stops after finding any library with size 0. At least that way I do not have to wait until it has processed all the files to know if it has read them.

Thanks a lot for your help !

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 16 days ago
Cambridge, UK

No report, I was just thinking it would stop and change the timing.

 

-R

ADD COMMENT

Login before adding your answer.

Traffic: 450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6