Question

DiffBind fails to derive consensus peaksets

0

Entering edit mode

Nicolas Delhomme ▴ 320

@nicolas-delhomme-6252

Last seen 7.1 years ago

Sweden

Hej Rory, Gord!

I get an error message from DiffBind dba.peakset when trying to derive a consensus peakset.

The error is: Error in if (res >= minval) {: missing value where TRUE/FALSE needed.

I have tracked down that error as far as I could and it seem to originate in the c code called by pv.dovectors that introduces NAs in the matrices it returns.

My dataset is unusual in a number of ways. We have used MACS2 to call peaks (these are broad peaks). The genome used is a draft, with >200k scaffolds and almost 30,000 have a peak reported.

I am not sure where the error originates; if it’s MAC OSX related, if it’s because of the large number of reference sequences or if it’s because I’m using MACS2.

I have documented all the steps there: https://umu.box.com/s/5i6u3bpf398uyd736dy7y6bexv07p82n.

I could also give you access to the data if need be; just let me know.

My session info:

## R version 3.2.0 (2015-04-16)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.3 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4    parallel stats     graphics grDevices utils     datasets
## [8] methods   base
##
## other attached packages:
## [1] DiffBind_1.14.3         RSQLite_1.0.0
## [3] DBI_0.3.1               locfit_1.5-9.1
## [5] GenomicAlignments_1.4.1 Rsamtools_1.20.1
## [7] Biostrings_2.36.0       XVector_0.8.0
## [9] limma_3.24.3            GenomicRanges_1.20.3
## [11] GenomeInfoDb_1.4.0      IRanges_2.2.1
## [13] S4Vectors_0.6.0         BiocGenerics_0.14.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.11.6            lattice_0.20-31        GO.db_3.1.2
## [4] gtools_3.4.2           digest_0.6.8           plyr_1.8.2
## [7] BatchJobs_1.6          futile.options_1.0.0   ShortRead_1.26.0
## [10] evaluate_0.7           ggplot2_1.0.1          gplots_2.17.0
## [13] zlibbioc_1.14.0        annotate_1.46.0        gdata_2.16.1
## [16] Matrix_1.2-0           checkmate_1.5.2        rmarkdown_0.5.1
## [19] systemPipeR_1.2.0      proto_0.3-10           GOstats_2.34.0
## [22] splines_3.2.0          BiocParallel_1.2.1     stringr_1.0.0
## [25] pheatmap_1.0.2         munsell_0.4.2          sendmailR_1.2-1
## [28] base64enc_0.1-2        BBmisc_1.9             htmltools_0.2.6
## [31] fail_1.2               edgeR_3.10.0           XML_3.98-1.1
## [34] AnnotationForge_1.10.1 MASS_7.3-40            bitops_1.0-6
## [37] grid_3.2.0             RBGL_1.44.0            xtable_1.7-4
## [40] GSEABase_1.30.1        gtable_0.1.2           magrittr_1.5
## [43] formatR_1.2            scales_0.2.4           graph_1.46.0
## [46] KernSmooth_2.23-14     amap_0.8-14            stringi_0.4-1
## [49] hwriter_1.3.2          reshape2_1.4.1         genefilter_1.50.0
## [52] latticeExtra_0.6-26    futile.logger_1.4.1    brew_1.0-6
## [55] rjson_0.2.15           lambda.r_1.1.7         RColorBrewer_1.1-2
## [58] tools_3.2.0            Biobase_2.28.0         Category_2.34.0
## [61] survival_2.38-1        yaml_2.1.13            AnnotationDbi_1.30.1
## [64] colorspace_1.2-6       caTools_1.17.1         knitr_1.10.5

software error chipseq diffbind • 2.2k views

ADD COMMENT • link updated 10.6 years ago by Gord Brown ▴ 670 • written 10.6 years ago by Nicolas Delhomme ▴ 320

score 0 · Answer 1 · 2015-05-07

0

Entering edit mode

Gord Brown ▴ 670

@gord-brown-5664

Last seen 4.9 years ago

United Kingdom

Hi,

Can't see what might be causing it. If you share your sample sheet and peaks, I'll try to reproduce it.

Cheers,

- Gord

ADD COMMENT • link 10.6 years ago Gord Brown ▴ 670

0

Entering edit mode

Hej Gord!

Thanks for the fast answer, I'll share the data with you offline in a bit.

Just as a side question, what settings do you recommend for reading MACS2 broadpeaks in? Using peakFormat="bed" seemed to do the trick, but do you plan to add macs2 as a supported peak caller in DiffBind?

Cheers,

Nico

Btw, I get the same with the devel version 1.15.2.

ADD REPLY • link 10.6 years ago Nicolas Delhomme ▴ 320

0

Entering edit mode

Adding peak formats is pretty minor; I'll put it on the to-do list. Should add HOMER too; I've been using it a lot these days.

For now, BED should work fine for broad peaks; all DiffBind really needs to know is chr, start, end, and which column the score is in (and whether higher is better).

Cheers...

ADD REPLY • link 10.6 years ago Gord Brown ▴ 670

0

Entering edit mode

I understand; it's just that if one does not precise the peakFormat, DiffBind ends up converting the first line of every macs2 broadpeak files into the colnames of the corresponding data.frame (in dba$peaks); i.e. it assumes the peak files have a header whereas they do not. So the first reported peaks gets lost.

ADD REPLY • link 10.6 years ago Nicolas Delhomme ▴ 320

0

Entering edit mode

That's a separate bug, which is (now) on my to-do list as well...

ADD REPLY • link 10.6 years ago Gord Brown ▴ 670

score 0 · Answer 2 · 2015-05-07

0

Entering edit mode

Gord Brown ▴ 670

@gord-brown-5664

Last seen 4.9 years ago

United Kingdom

Hi,

The problem is that the scores in your peak files are a bit unusual. Many are very large (10^36), while others are negative. If I alter the scores to fit the range (1,10000) then the error does not occur.

Given that MACS2 generated these, your copy of MACS2 may be off its medication. Or some other processing step went wrong. The numbers look a bit like an incorrect conversion between integer and float.

We'll add something to warn if the scores are implausible.

Cheers,

- Gord

ADD COMMENT • link 10.6 years ago Gord Brown ▴ 670

0

Entering edit mode

Hej Gord! I see. I should have thought of checking that, but there was no tell-tale signs in the results we've looked at so far. I'll go backward in the process and figure out what happened. Sorry taking your time for this :-\ Thanks for your time and figuring this out! Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme The Street Lab Department of Plant Physiology Umeå Plant Science Center Tel: +46 90 786 5478 Email: nicolas.delhomme@umu.se SLU - Umeå universitet Umeå S-901 87 Sweden --------------------------------------------------------------- > On 07 May 2015, at 18:02, Gord Brown [bioc] <noreply@bioconductor.org> wrote: > > Activity on a post you are following on support.bioconductor.org > User Gord Brown wrote Answer: DiffBind fails to derive consensus peaksets: > > > Hi, > > The problem is that the scores in your peak files are a bit unusual. Many are very large (10^36), while others are negative. If I alter the scores to fit the range (1,10000) then the error does not occur. > > Given that MACS2 generated these, your copy of MACS2 may be off its medication. Or some other processing step went wrong. The numbers look a bit like an incorrect conversion between integer and float. > > We'll add something to warn if the scores are implausible. > > Cheers, > > - Gord > > > You may reply via email or visit A: DiffBind fails to derive consensus peaksets >

ADD REPLY • link 10.6 years ago Nicolas Delhomme ▴ 320