Search
Question: Error using countBam (Rsamtools package)
1
2.1 years ago by
t.kuilman130
Netherlands
t.kuilman130 wrote:

I have run a very large bam (186 G) file through the countBam function from the Rsamtools package, and got an unexpected error:

> library(Rsamtools)
> countBam(large.bam.file)
Error in value[[3L]](cond) : 'countBam' failed:
record: -2023186199
error: 0
file: /home/NFS/research_projects/combat_cancer/Lung_cancer_data_set/Mouse/FVB_NJ.bam
index:

At first I thought that there was an error in the bam file, but we have checked the integrity of the bam file compared to the source using md5, and additionally running samtools from the command line on the same file just works fine:

$samtools view -c ./large_bam_file.bam 2271781097 Also, I have used countBam successfully for other smaller bam files. Does anyone have any suggestions as to where this error stems from? Here's my sessionInfo(): > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.1 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] snow_0.4-1 futile.logger_1.4.1 [3] CopyhelpeR_1.4.0 DNAcopy_1.46.0 [5] chipseq_1.22.0 ShortRead_1.30.0 [7] GenomicAlignments_1.8.0 SummarizedExperiment_1.2.2 [9] Biobase_2.32.0 Rsamtools_1.24.0 [11] Biostrings_2.40.0 XVector_0.12.0 [13] BiocParallel_1.6.2 GenomicRanges_1.24.0 [15] GenomeInfoDb_1.8.1 IRanges_2.6.0 [17] S4Vectors_0.10.0 BiocGenerics_0.18.0 [19] data.table_1.9.6 gtools_3.5.0 [21] matrixStats_0.50.2 loaded via a namespace (and not attached): [1] zlibbioc_1.18.0 lattice_0.20-33 hwriter_1.3.2 [4] tools_3.3.0 grid_3.3.0 latticeExtra_0.6-28 [7] lambda.r_1.1.7 RColorBrewer_1.1-2 futile.options_1.0.0 [10] bitops_1.0-6 chron_2.3-47 ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by t.kuilman130 That looks like an integer overflow problem to me. You have more records in the bam file than can be stored in an integer. Perhaps as a workaround you could count several subsets of the reads (perhaps using ScanBamParam to pick chromosomes) and then combine the individual results as numerics? ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Mike Smith2.7k Thanks for the suggestion. The maximum number an integer can have on my system is as follows: > .Machine$integer.max
[1] 2147483647

Can any of the authors of the Rsamtools package fill us in whether counting reads can only go up to the maximum integer number?

Yes, that's the current limitation; it'll be updated shortly.