Hej Rory, Gord!
I get an error message from DiffBind dba.peakset when trying to derive a consensus peakset.
The error is: Error in if (res >= minval) {: missing value where TRUE/FALSE needed.
I have tracked down that error as far as I could and it seem to originate in the c code called by pv.dovectors that introduces NAs in the matrices it returns.
My dataset is unusual in a number of ways. We have used MACS2 to call peaks (these are broad peaks). The genome used is a draft, with >200k scaffolds and almost 30,000 have a peak reported.
I am not sure where the error originates; if it’s MAC OSX related, if it’s because of the large number of reference sequences or if it’s because I’m using MACS2.
I have documented all the steps there: https://umu.box.com/s/5i6u3bpf398uyd736dy7y6bexv07p82n.
I could also give you access to the data if need be; just let me know.
My session info:
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.3 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] DiffBind_1.14.3 RSQLite_1.0.0
## [3] DBI_0.3.1 locfit_1.5-9.1
## [5] GenomicAlignments_1.4.1 Rsamtools_1.20.1
## [7] Biostrings_2.36.0 XVector_0.8.0
## [9] limma_3.24.3 GenomicRanges_1.20.3
## [11] GenomeInfoDb_1.4.0 IRanges_2.2.1
## [13] S4Vectors_0.6.0 BiocGenerics_0.14.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.11.6 lattice_0.20-31 GO.db_3.1.2
## [4] gtools_3.4.2 digest_0.6.8 plyr_1.8.2
## [7] BatchJobs_1.6 futile.options_1.0.0 ShortRead_1.26.0
## [10] evaluate_0.7 ggplot2_1.0.1 gplots_2.17.0
## [13] zlibbioc_1.14.0 annotate_1.46.0 gdata_2.16.1
## [16] Matrix_1.2-0 checkmate_1.5.2 rmarkdown_0.5.1
## [19] systemPipeR_1.2.0 proto_0.3-10 GOstats_2.34.0
## [22] splines_3.2.0 BiocParallel_1.2.1 stringr_1.0.0
## [25] pheatmap_1.0.2 munsell_0.4.2 sendmailR_1.2-1
## [28] base64enc_0.1-2 BBmisc_1.9 htmltools_0.2.6
## [31] fail_1.2 edgeR_3.10.0 XML_3.98-1.1
## [34] AnnotationForge_1.10.1 MASS_7.3-40 bitops_1.0-6
## [37] grid_3.2.0 RBGL_1.44.0 xtable_1.7-4
## [40] GSEABase_1.30.1 gtable_0.1.2 magrittr_1.5
## [43] formatR_1.2 scales_0.2.4 graph_1.46.0
## [46] KernSmooth_2.23-14 amap_0.8-14 stringi_0.4-1
## [49] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.50.0
## [52] latticeExtra_0.6-26 futile.logger_1.4.1 brew_1.0-6
## [55] rjson_0.2.15 lambda.r_1.1.7 RColorBrewer_1.1-2
## [58] tools_3.2.0 Biobase_2.28.0 Category_2.34.0
## [61] survival_2.38-1 yaml_2.1.13 AnnotationDbi_1.30.1
## [64] colorspace_1.2-6 caTools_1.17.1 knitr_1.10.5
Hej Gord!
Thanks for the fast answer, I'll share the data with you offline in a bit.
Just as a side question, what settings do you recommend for reading MACS2 broadpeaks in? Using peakFormat="bed" seemed to do the trick, but do you plan to add macs2 as a supported peak caller in DiffBind?
Cheers,
Nico
Btw, I get the same with the devel version 1.15.2.
Adding peak formats is pretty minor; I'll put it on the to-do list. Should add HOMER too; I've been using it a lot these days.
For now, BED should work fine for broad peaks; all DiffBind really needs to know is chr, start, end, and which column the score is in (and whether higher is better).
Cheers...
I understand; it's just that if one does not precise the peakFormat, DiffBind ends up converting the first line of every macs2 broadpeak files into the colnames of the corresponding data.frame (in dba$peaks); i.e. it assumes the peak files have a header whereas they do not. So the first reported peaks gets lost.
That's a separate bug, which is (now) on my to-do list as well...