Search
Question: Error using countBam (Rsamtools package)
1
gravatar for t.kuilman
18 months ago by
t.kuilman100
Netherlands
t.kuilman100 wrote:

I have run a very large bam (186 G) file through the countBam function from the Rsamtools package, and got an unexpected error:

> library(Rsamtools)
> countBam(large.bam.file)
  Error in value[[3L]](cond) : 'countBam' failed:
  record: -2023186199
  error: 0
  file: /home/NFS/research_projects/combat_cancer/Lung_cancer_data_set/Mouse/FVB_NJ.bam
  index:

At first I thought that there was an error in the bam file, but we have checked the integrity of the bam file compared to the source using md5, and additionally running samtools from the command line on the same file just works fine:

$ samtools view -c ./large_bam_file.bam
  2271781097

Also, I have used countBam successfully for other smaller bam files. Does anyone have any suggestions as to where this error stems from?

Here's my sessionInfo():

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] snow_0.4-1                 futile.logger_1.4.1
 [3] CopyhelpeR_1.4.0           DNAcopy_1.46.0
 [5] chipseq_1.22.0             ShortRead_1.30.0
 [7] GenomicAlignments_1.8.0    SummarizedExperiment_1.2.2
 [9] Biobase_2.32.0             Rsamtools_1.24.0
[11] Biostrings_2.40.0          XVector_0.12.0
[13] BiocParallel_1.6.2         GenomicRanges_1.24.0
[15] GenomeInfoDb_1.8.1         IRanges_2.6.0
[17] S4Vectors_0.10.0           BiocGenerics_0.18.0
[19] data.table_1.9.6           gtools_3.5.0
[21] matrixStats_0.50.2

loaded via a namespace (and not attached):
 [1] zlibbioc_1.18.0      lattice_0.20-33      hwriter_1.3.2
 [4] tools_3.3.0          grid_3.3.0           latticeExtra_0.6-28
 [7] lambda.r_1.1.7       RColorBrewer_1.1-2   futile.options_1.0.0
[10] bitops_1.0-6         chron_2.3-47
ADD COMMENTlink modified 18 months ago • written 18 months ago by t.kuilman100

That looks like an integer overflow problem to me.  You have more records in the bam file than can be stored in an integer.

Perhaps as a workaround you could count several subsets of the reads (perhaps using ScanBamParam to pick chromosomes) and then combine the individual results as numerics?

ADD REPLYlink modified 18 months ago • written 18 months ago by Mike Smith2.1k

Thanks for the suggestion. The maximum number an integer can have on my system is as follows:

> .Machine$integer.max
[1] 2147483647

Can any of the authors of the Rsamtools package fill us in whether counting reads can only go up to the maximum integer number?

ADD REPLYlink modified 18 months ago • written 18 months ago by t.kuilman100

Yes, that's the current limitation; it'll be updated shortly.

ADD REPLYlink written 18 months ago by Martin Morgan ♦♦ 20k

Thanks for clarifying that. Will it be implemented in the next Bioconductor release?

ADD REPLYlink written 18 months ago by t.kuilman100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 120 users visited in the last hour