Hello everyone,
I am having problems running diffbind count function. I keep getting the following error (running in R on Mac):
> H3K9me3 = dba.count(H3K9me3, minOverlap=2) Error: Error processing one or more read files. Check warnings(). In addition: Warning messages: 1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) : all scheduled cores encountered errors in user code 2: file '~/Desktop/H3K9me3_bam/Wt1_H2K9me3_chip_sorted.bam.bam' does not appear to be a BAM file (bad magic number)
I checked the numerical code at the end of my bam file and it appears to be correct:
00029de0 7f 02 32 f0 24 b6 00 ed 00 00 1f 8b 08 04 00 00 |..2.$...........| 00029df0 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 |......BC........| 00029e00 00 00 00 00 00 00 |......| 00029e06
I have sorted and resorted with sam tools with no change. The bam file works fine for loading into IGV and such, so it is not unreadable. Any ideas? Thanks!
sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils [7] datasets methods base other attached packages: [1] DiffBind_1.14.6 RSQLite_1.0.0 [3] DBI_0.3.1 locfit_1.5-9.1 [5] GenomicAlignments_1.4.1 Rsamtools_1.20.4 [7] Biostrings_2.36.4 XVector_0.8.0 [9] limma_3.24.15 GenomicRanges_1.20.8 [11] GenomeInfoDb_1.4.3 IRanges_2.2.7 [13] S4Vectors_0.6.6 BiocGenerics_0.14.0 [15] BiocInstaller_1.18.4 loaded via a namespace (and not attached): [1] Rcpp_0.12.1 lattice_0.20-33 [3] GO.db_3.1.2 gtools_3.5.0 [5] digest_0.6.8 plyr_1.8.3 [7] futile.options_1.0.0 BatchJobs_1.6 [9] ShortRead_1.26.0 ggplot2_1.0.1 [11] gplots_2.17.0 zlibbioc_1.14.0 [13] annotate_1.46.1 gdata_2.17.0 [15] Matrix_1.2-2 checkmate_1.6.2 [17] systemPipeR_1.2.23 proto_0.3-10 [19] GOstats_2.34.0 splines_3.2.2 [21] BiocParallel_1.2.21 stringr_1.0.0 [23] pheatmap_1.0.7 munsell_0.4.2 [25] sendmailR_1.2-1 base64enc_0.1-3 [27] BBmisc_1.9 fail_1.2 [29] edgeR_3.10.2 XML_3.98-1.3 [31] AnnotationForge_1.10.1 MASS_7.3-44 [33] bitops_1.0-6 grid_3.2.2 [35] RBGL_1.44.0 xtable_1.7-4 [37] GSEABase_1.30.2 gtable_0.1.2 [39] magrittr_1.5 scales_0.3.0 [41] graph_1.46.0 KernSmooth_2.23-15 [43] amap_0.8-14 stringi_0.5-5 [45] hwriter_1.3.2 reshape2_1.4.1 [47] genefilter_1.50.0 latticeExtra_0.6-26 [49] futile.logger_1.4.1 brew_1.0-6 [51] rjson_0.2.15 lambda.r_1.1.7 [53] RColorBrewer_1.1-2 tools_3.2.2 [55] Biobase_2.28.0 Category_2.34.2 [57] survival_2.38-3 AnnotationDbi_1.30.1 [59] colorspace_1.2-6 caTools_1.17.1
Hi,
The BAM format requires that the *first* 4 bytes of a legal BAM file (after uncompressing) be 'BAM\1'. All BAMs I've used have it. You can check yours like so:
The first 3 characters should be "BAM" (the next will probably display as an "r", but it's really ASCII 1). If you see something else, it's not a legal BAM file. It's possible that samtools doesn't check for the magic number, though.
Let me know what you get when you try.
- Gord
Does the simpler task of reading the bam file succeed, e.g.,
(is that the actual name of the bam file, with '.bam.bam'? Does it help to expand the home directory '~/'?) or
Good point, I can almost guarantee that tilde expansion isn't supported by my code. So try with an absolute path. The little bit of code that checks for the magic number could be giving the wrong error message, if it can't read the file at all. (I'll check and update the code if so.)
So yes, it turns out that
gzopen
(which my code uses to check for the magic number) doesn't do tilde expansion, and my code incorrectly produces the wrong error. So absolute paths should fix the problem. I'll fix this bug for the next release. Sorry for the inconvenience.Over on SEQanswers, where this thread originated, I suggested trying setting
bUseSummarizeOverlaps=TRUE
, which uses the standardGenomicAlignments
functions to do the counting (BamFileList
,ScanBamParam
, andsummarizeOverlaps
) . It worked fine using those functions, so our code must be handling it differently.-R