Rsamtools error with long read sets
1
1
Entering edit mode
wtimp0 ▴ 10
@wtimp0-6899
Last seen 9.4 years ago
United States

Hi,

I have found this recurrent error with RSamtools when loading alignments of load read data sets(PacBio/Oxford Nanopore):

what=c("qname","cigar")
param=ScanBamParam(what=what, tag=c("NM", "MD"))
bamf=BamFile(filey)
bam=scanBam(bamf, param=param)[[1]]

Error in value[[3L]](cond) :
   'Realloc' could not re-allocate memory (0 bytes)
   file: /mithril/Data/Nanopore/oxford/082814_lambdaunmeth/analysis/bwa400b.bam
   index:

I've included a bam file (https://www.dropbox.com/s/od6oeeuvzv8dlfj/bwa400b.bam?dl=0) which will cause this with a minimal number of reads (5) - the file itself is only ~80Kb - so I expect this is a character overrun in loading the cigar.  It seems that if I don't load the cigar (just load qname), it doesn't happen.

Any suggestions?  Right now I'm backing into it with a python script to strip out the cigar and MD tags instead, but prefer to just work in R.

Thanks,

Winston Timp

traceback output:

 traceback()
 8: stop(conditionMessage(err), "\n  file: ", path(file), "\n  index: ",
        index(file))
 7: value[[3L]](cond)
 6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5: tryCatchList(expr, classes, parentenv, handlers)
 4: tryCatch({
        .Call(func, .extptr(file), space, flag, simpleCigar, ...)
    }, error = function(err) {
        stop(conditionMessage(err), "\n  file: ", path(file), "\n  index: ",
            index(file))
    })
 3: .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file),
        tmpl, obeyQname(file), asMates(file), param = param)
 2: scanBam(z, param = param)
 1: scanBam(z, param = param)
 >


sessionInfo output:

 sessionInfo()
 R version 3.1.1 (2014-07-10)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] parallel  stats     graphics  grDevices utils     datasets  methods
 [8] base

 other attached packages:
  [1] stringr_0.6.2           mgcv_1.8-3              nlme_3.1-118
  [4] hexbin_1.27.0           xtable_1.7-4            plyr_1.8.1
  [7] reshape2_1.4            ggplot2_1.0.0           ShortRead_1.22.0
 [10] GenomicAlignments_1.0.6 BSgenome_1.32.0         Rsamtools_1.16.1
 [13] GenomicRanges_1.16.4    GenomeInfoDb_1.0.2      BiocParallel_0.6.1
 [16] Biostrings_2.32.1       XVector_0.4.0           IRanges_1.22.10
 [19] BiocGenerics_0.10.0     knitr_1.7

 loaded via a namespace (and not attached):
  [1] base64enc_0.1-2     BatchJobs_1.4       BBmisc_1.7
  [4] Biobase_2.24.0      bitops_1.0-6        brew_1.0-6
  [7] checkmate_1.5.0     codetools_0.2-9     colorspace_1.2-4
 [10] compiler_3.1.1      DBI_0.3.1           digest_0.6.4
 [13] evaluate_0.5.5      fail_1.2            foreach_1.4.2
 [16] formatR_1.0         grid_3.1.1          gtable_0.1.2
 [19] highr_0.3           hwriter_1.3.2       iterators_1.0.7
 [22] labeling_0.3        lattice_0.20-29     latticeExtra_0.6-26
 [25] MASS_7.3-35         Matrix_1.1-4        munsell_0.4.2
 [28] proto_0.3-10        RColorBrewer_1.0-5  Rcpp_0.11.3
 [31] RSQLite_0.11.4      scales_0.2.4        sendmailR_1.2-1
 [34] stats4_3.1.1        tools_3.1.1         zlibbioc_1.10.0
 >
rsamtools software error • 1.2k views
ADD COMMENT
0
Entering edit mode

I can reproduce the problem and will look in to this, thanks!

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

Thanks for the reproducible example. The problem was that the cigar was longer than a fixed size (32768 characters), and Rsamtools was failing badly. The cigar can now (1.18.1 in release available around noon on Friday via biocLite(), check the landing page, 1.19.2 in devel)  be any size.

ADD COMMENT
0
Entering edit mode

Thanks Martin - this is as I expected - normal shortread sequencing would probably never run into such a problem, but I expect that pacbio and now oxford nanopore reads might have it - especially with their insertion/deletion rate.

ADD REPLY

Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6