Error using countBam (Rsamtools package)
0
1
Entering edit mode
t.kuilman ▴ 170
@tkuilman-6868
Last seen 22 months ago
Netherlands

I have run a very large bam (186 G) file through the countBam function from the Rsamtools package, and got an unexpected error:

> library(Rsamtools)
> countBam(large.bam.file)
  Error in value[[3L]](cond) : 'countBam' failed:
  record: -2023186199
  error: 0
  file: /home/NFS/research_projects/combat_cancer/Lung_cancer_data_set/Mouse/FVB_NJ.bam
  index:

At first I thought that there was an error in the bam file, but we have checked the integrity of the bam file compared to the source using md5, and additionally running samtools from the command line on the same file just works fine:

$ samtools view -c ./large_bam_file.bam
  2271781097

Also, I have used countBam successfully for other smaller bam files. Does anyone have any suggestions as to where this error stems from?

Here's my sessionInfo():

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] snow_0.4-1                 futile.logger_1.4.1
 [3] CopyhelpeR_1.4.0           DNAcopy_1.46.0
 [5] chipseq_1.22.0             ShortRead_1.30.0
 [7] GenomicAlignments_1.8.0    SummarizedExperiment_1.2.2
 [9] Biobase_2.32.0             Rsamtools_1.24.0
[11] Biostrings_2.40.0          XVector_0.12.0
[13] BiocParallel_1.6.2         GenomicRanges_1.24.0
[15] GenomeInfoDb_1.8.1         IRanges_2.6.0
[17] S4Vectors_0.10.0           BiocGenerics_0.18.0
[19] data.table_1.9.6           gtools_3.5.0
[21] matrixStats_0.50.2

loaded via a namespace (and not attached):
 [1] zlibbioc_1.18.0      lattice_0.20-33      hwriter_1.3.2
 [4] tools_3.3.0          grid_3.3.0           latticeExtra_0.6-28
 [7] lambda.r_1.1.7       RColorBrewer_1.1-2   futile.options_1.0.0
[10] bitops_1.0-6         chron_2.3-47
rsamtools countbam • 1.3k views
ADD COMMENT
0
Entering edit mode

That looks like an integer overflow problem to me.  You have more records in the bam file than can be stored in an integer.

Perhaps as a workaround you could count several subsets of the reads (perhaps using ScanBamParam to pick chromosomes) and then combine the individual results as numerics?

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. The maximum number an integer can have on my system is as follows:

> .Machine$integer.max
[1] 2147483647

Can any of the authors of the Rsamtools package fill us in whether counting reads can only go up to the maximum integer number?

ADD REPLY
0
Entering edit mode

Yes, that's the current limitation; it'll be updated shortly.

ADD REPLY
0
Entering edit mode

Thanks for clarifying that. Will it be implemented in the next Bioconductor release?

ADD REPLY

Login before adding your answer.

Traffic: 930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6