I'm having an issue importing (part of) a BAM file into R using the Rsamtools readGAlignmentPairs() function. In particular, I raise a peculiar error ("subscript contains NAs") when using a ScanBamParam object with bamMapqFilter set to not NA and > 0. This only happens with some BAM files, so it may be a problem with the particular BAM file and not the Rsamtools code. I apologize for any mistakes, this is my first post. Any help with the following issue would be greatly appreciated. For example:
gr <- GRanges('chr1',IRanges(1e6,2e6)) sbp <- ScanBamParam(mapqFilter=NA,which=gr) ga <- readGAlignmentPairs(bamFile,param=sbp)
works without issue. However,
gr <- GRanges('chr1',IRanges(1e6,2e6)) sbp <- ScanBamParam(mapqFilter=10,which=gr) ga <- readGAlignmentPairs(bamFile,param=sbp) Error: subscript contains NAs
produces the error. Inspection suggests there are about 10000 reads in this region that have mapq > 10, so I'm not sure why an error is being raised. Here is the traceback()
14: stop(wmsg(...), call. = FALSE) 13: .subscript_error("subscript contains NAs") 12: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs) 11: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs) 10: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE) 9: extractROWS(x, i) 8: extractROWS(x, i) 7: gal[idx2] 6: gal[idx2] 5: .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, use.mcols = use.mcols) 4: readGAlignmentPairs(bam, character(0), use.names = use.names, param = param, with.which_label = with.which_label, strandMode = strandMode) 3: readGAlignmentPairs(bam, character(0), use.names = use.names, param = param, with.which_label = with.which_label, strandMode = strandMode) 2: readGAlignmentPairs(bamFile, param = sbp) 1: readGAlignmentPairs(bamFile, param = sbp)
And my sessionInfo()
R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /software/free/r/R-3.4.0/lib/R/lib/libRblas.so LAPACK: /software/free/r/R-3.4.0/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=C LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_3.4.1 bedr_1.0.3 [3] pbapply_1.3-3 BSgenome.Hsapiens.UCSC.hg19_1.4.0 [5] BSgenome_1.44.0 rtracklayer_1.36.3 [7] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.4 [9] AnnotationDbi_1.38.1 GenomicAlignments_1.12.1 [11] SummarizedExperiment_1.6.3 DelayedArray_0.2.7 [13] matrixStats_0.52.2 Biobase_2.36.2 [15] Rsamtools_1.28.0 Biostrings_2.44.1 [17] XVector_0.16.0 GenomicRanges_1.28.3 [19] GenomeInfoDb_1.12.2 IRanges_2.10.2 [21] S4Vectors_0.14.3 BiocGenerics_0.22.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.11 futile.logger_1.4.3 compiler_3.4.0 R.methodsS3_1.7.1 [5] R.utils_2.5.0 futile.options_1.0.0 bitops_1.0-6 tools_3.4.0 [9] zlibbioc_1.22.0 testthat_1.0.2 biomaRt_2.32.1 digest_0.6.12 [13] bit_1.1-12 RSQLite_2.0 memoise_1.1.0 tibble_1.3.3 [17] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.1 Matrix_1.2-10 [21] DBI_0.7 yaml_2.1.14 VennDiagram_1.6.17 GenomeInfoDbData_0.99.0 [25] bit64_0.9-7 grid_3.4.0 data.table_1.10.4 R6_2.2.2 [29] XML_3.98-1.9 BiocParallel_1.10.1 magrittr_1.5 lambda.r_1.1.9 [33] blob_1.1.0 RCurl_1.95-4.8 crayon_1.3.2 R.oo_1.21.0
Hi,
I would need to be able to reproduce this error in order to troubleshoot. Do you think you can make your BAM file available somewhere? Or a subset of it only as long as it allows me to reproduce the error. Thanks!
H.
Hi Hervé,
Thank you for your response, I apologize I haven't responded sooner. Here is a download link to a small bam file (2.9 Mb) and its associated bam index. The BAM is a sampling of about 15,000 PE reads on chromosome 1 (mapped with BWA to hg19). The index was created with
Rsamtools:::indexBam()
. The error can be created as such:Thanks for providing the file. The problem should be fixed in GenomicAlignments 1.12.2 (release) and 1.13.5 (devel). Both versions should become available via
biocLite()
in the next 48 hours or so.Cheers,
H.