Hej Rory, Gord!
I get an error message from DiffBind dba.peakset when trying to derive a consensus peakset.
The error is: Error in if (res >= minval) {: missing value where TRUE/FALSE needed.
I have tracked down that error as far as I could and it seem to originate in the c code called by pv.dovectors that introduces NAs in the matrices it returns.
My dataset is unusual in a number of ways. We have used MACS2 to call peaks (these are broad peaks). The genome used is a draft, with >200k scaffolds and almost 30,000 have a peak reported.
I am not sure where the error originates; if it’s MAC OSX related, if it’s because of the large number of reference sequences or if it’s because I’m using MACS2.
I have documented all the steps there: https://umu.box.com/s/5i6u3bpf398uyd736dy7y6bexv07p82n.
I could also give you access to the data if need be; just let me know.
My session info:
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.3 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets
## [8] methods   base     
##
## other attached packages:
##  [1] DiffBind_1.14.3         RSQLite_1.0.0          
##  [3] DBI_0.3.1               locfit_1.5-9.1         
##  [5] GenomicAlignments_1.4.1 Rsamtools_1.20.1       
##  [7] Biostrings_2.36.0       XVector_0.8.0          
##  [9] limma_3.24.3            GenomicRanges_1.20.3   
## [11] GenomeInfoDb_1.4.0      IRanges_2.2.1          
## [13] S4Vectors_0.6.0         BiocGenerics_0.14.0    
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.11.6            lattice_0.20-31        GO.db_3.1.2           
##  [4] gtools_3.4.2           digest_0.6.8           plyr_1.8.2            
##  [7] BatchJobs_1.6          futile.options_1.0.0   ShortRead_1.26.0      
## [10] evaluate_0.7           ggplot2_1.0.1          gplots_2.17.0         
## [13] zlibbioc_1.14.0        annotate_1.46.0        gdata_2.16.1          
## [16] Matrix_1.2-0           checkmate_1.5.2        rmarkdown_0.5.1       
## [19] systemPipeR_1.2.0      proto_0.3-10           GOstats_2.34.0        
## [22] splines_3.2.0          BiocParallel_1.2.1     stringr_1.0.0         
## [25] pheatmap_1.0.2         munsell_0.4.2          sendmailR_1.2-1       
## [28] base64enc_0.1-2        BBmisc_1.9             htmltools_0.2.6       
## [31] fail_1.2               edgeR_3.10.0           XML_3.98-1.1          
## [34] AnnotationForge_1.10.1 MASS_7.3-40            bitops_1.0-6          
## [37] grid_3.2.0             RBGL_1.44.0            xtable_1.7-4          
## [40] GSEABase_1.30.1        gtable_0.1.2           magrittr_1.5          
## [43] formatR_1.2            scales_0.2.4           graph_1.46.0          
## [46] KernSmooth_2.23-14     amap_0.8-14            stringi_0.4-1         
## [49] hwriter_1.3.2          reshape2_1.4.1         genefilter_1.50.0     
## [52] latticeExtra_0.6-26    futile.logger_1.4.1    brew_1.0-6            
## [55] rjson_0.2.15           lambda.r_1.1.7         RColorBrewer_1.1-2    
## [58] tools_3.2.0            Biobase_2.28.0         Category_2.34.0       
## [61] survival_2.38-1        yaml_2.1.13            AnnotationDbi_1.30.1  
## [64] colorspace_1.2-6       caTools_1.17.1         knitr_1.10.5

Hej Gord!
Thanks for the fast answer, I'll share the data with you offline in a bit.
Just as a side question, what settings do you recommend for reading MACS2 broadpeaks in? Using peakFormat="bed" seemed to do the trick, but do you plan to add macs2 as a supported peak caller in DiffBind?
Cheers,
Nico
Btw, I get the same with the devel version 1.15.2.
Adding peak formats is pretty minor; I'll put it on the to-do list. Should add HOMER too; I've been using it a lot these days.
For now, BED should work fine for broad peaks; all DiffBind really needs to know is chr, start, end, and which column the score is in (and whether higher is better).
Cheers...
I understand; it's just that if one does not precise the peakFormat, DiffBind ends up converting the first line of every macs2 broadpeak files into the colnames of the corresponding data.frame (in dba$peaks); i.e. it assumes the peak files have a header whereas they do not. So the first reported peaks gets lost.
That's a separate bug, which is (now) on my to-do list as well...