Hi,
I am trying to run DiffBind using parallel execution but it does not detect multiple cores, although I can through:
> parallel::detectCores() [1] 8
When I try to run DiffBind. Here's what I see with a test run:
> test = dba(sampleSheet = "TEST.xls") wt_D_1 wt D 1 bayes wt_D_2 wt D 2 bayes > test.counts = dba.count(test, minOverlap=1) Sample: 01_wt_D_1_BAM_MD_asBED.bed125 Sample: 02_wt_D_2_BAM_MD_asBED.bed125 Sample: Input_files/25_wt_D_INP_BAM_MD_asBED.bed125 Warning message: In dba.multicore.init(DBA$config) : Parallel execution unavailable: executing serially.
What should I do to run dba.count in parallel? It would reduce the analysis time a lot for me. If you need any more info or me to run any other diagnostic command, please ask.
Thanks!
PS: Here is my sessionInfo() output in case it helps
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] DiffBind_2.4.8             SummarizedExperiment_1.6.5 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
 [6] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7           BiocGenerics_0.22.1       
loaded via a namespace (and not attached):
 [1] edgeR_3.18.1             bit64_0.9-7              splines_3.4.0            gtools_3.5.0             assertthat_0.2.0        
 [6] latticeExtra_0.6-28      amap_0.8-14              RBGL_1.52.0              blob_1.1.0               GenomeInfoDbData_0.99.0 
[11] Rsamtools_1.28.0         ggrepel_0.7.0            Category_2.42.1          pillar_1.1.0             RSQLite_2.0             
[16] backports_1.1.2          lattice_0.20-35          glue_1.2.0               limma_3.32.10            digest_0.6.14           
[21] RColorBrewer_1.1-2       XVector_0.16.0           checkmate_1.8.5          colorspace_1.3-2         Matrix_1.2-12           
[26] plyr_1.8.4               GSEABase_1.38.2          XML_3.98-1.9             pkgconfig_2.0.1          pheatmap_1.0.8          
[31] ShortRead_1.34.2         biomaRt_2.32.1           genefilter_1.58.1        zlibbioc_1.22.0          xtable_1.8-2            
[36] GO.db_3.4.1              scales_0.5.0             brew_1.0-6               gdata_2.18.0             BiocParallel_1.10.1     
[41] tibble_1.4.1             annotate_1.54.0          ggplot2_2.2.1            GenomicFeatures_1.28.5   lazyeval_0.2.1          
[46] XLConnect_0.2-13         magrittr_1.5             survival_2.41-3          memoise_1.1.0            systemPipeR_1.10.2      
[51] gplots_3.0.1             hwriter_1.3.2            GOstats_2.42.0           graph_1.54.0             tools_3.4.0             
[56] data.table_1.10.4-3      BBmisc_1.11              sendmailR_1.2-1          munsell_0.4.3            locfit_1.5-9.1          
[61] bindrcpp_0.2             AnnotationDbi_1.38.2     Biostrings_2.44.2        compiler_3.4.0           caTools_1.17.1          
[66] rlang_0.1.6              grid_3.4.0               RCurl_1.95-4.10          rjson_0.2.15             AnnotationForge_1.18.2  
[71] base64enc_0.1-3          bitops_1.0-6             gtable_0.2.0             DBI_0.7                  R6_2.2.2                
[76] GenomicAlignments_1.12.2 dplyr_0.7.4              rtracklayer_1.36.6       bit_1.1-12               bindr_0.1               
[81] XLConnectJars_0.2-13     KernSmooth_2.23-15       rJava_0.9-9              stringi_1.1.6            BatchJobs_1.7           
[86] Rcpp_0.12.14

Thanks for the quick reply !
I do not know if it's appropiate, but instead of starting a new thread I wanted to ask you a follow-up question.
I have a huge dataset (32 files) and when I run the dba.count command, sometimes, it skips some files and doesnt count the reads. I have run it several times, and everytime the files that get "skipped" are different. I do not know if I'm running out of memory or what could be the cause for this behavior. I have resorted to run it until I get all the files read, but it is very time consuming.
This is an example of the message I obtain after running the dba.count :
Hmmn. Since you are running serially anyway, is may be best to set
bParallel=FALSEand see if that is any better.This is a tough one to debug remotely. One thing you could try if you get desperate is to set:
and then type "c" whenever it stops.
I tried it with bParallel=FALSE but I am still getting files skipped. Sometimes it's just one, sometimes it's 10 of them.
I am running the debug. What should I look for? Does it give an automated report at the end?
Alternatively, I could add a line so it stops after finding any library with size 0. At least that way I do not have to wait until it has processed all the files to know if it has read them.
Thanks a lot for your help !