Question: Possible issue with Rsamtools ScanBamParam mapqfilter
0
gravatar for Devin King
2.1 years ago by
Devin King20
Devin King20 wrote:

I'm having an issue importing (part of) a BAM file into R using the Rsamtools readGAlignmentPairs() function. In particular, I raise a peculiar error ("subscript contains NAs") when using a ScanBamParam object with bamMapqFilter set to not NA and > 0.  This only happens with some BAM files, so it may be a problem with the particular BAM file and not the Rsamtools code. I apologize for any mistakes, this is my first post. Any help with the following issue would be greatly appreciated. For example:

gr <- GRanges('chr1',IRanges(1e6,2e6))
sbp <- ScanBamParam(mapqFilter=NA,which=gr)
ga <- readGAlignmentPairs(bamFile,param=sbp)

works without issue. However,

gr <- GRanges('chr1',IRanges(1e6,2e6))
sbp <- ScanBamParam(mapqFilter=10,which=gr)
ga <- readGAlignmentPairs(bamFile,param=sbp)
Error: subscript contains NAs

produces the error. Inspection suggests there are about 10000 reads in this region that have mapq > 10, so I'm not sure why an error is being raised. Here is the traceback()

14: stop(wmsg(...), call. = FALSE)
13: .subscript_error("subscript contains NAs")
12: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
        allow.NAs = allow.NAs)
11: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
        allow.NAs = allow.NAs)
10: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
9: extractROWS(x, i)
8: extractROWS(x, i)
7: gal[idx2]
6: gal[idx2]
5: .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, 
       use.mcols = use.mcols)
4: readGAlignmentPairs(bam, character(0), use.names = use.names, 
       param = param, with.which_label = with.which_label, strandMode = strandMode)
3: readGAlignmentPairs(bam, character(0), use.names = use.names, 
       param = param, with.which_label = with.which_label, strandMode = strandMode)
2: readGAlignmentPairs(bamFile, param = sbp)
1: readGAlignmentPairs(bamFile, param = sbp)

And my sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /software/free/r/R-3.4.0/lib/R/lib/libRblas.so
LAPACK: /software/free/r/R-3.4.0/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C         LC_MONETARY=C       
 [6] LC_MESSAGES=C        LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C         LC_TELEPHONE=C      
[11] LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.4.1                      bedr_1.0.3                             
 [3] pbapply_1.3-3                           BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [5] BSgenome_1.44.0                         rtracklayer_1.36.3                     
 [7] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.4                 
 [9] AnnotationDbi_1.38.1                    GenomicAlignments_1.12.1               
[11] SummarizedExperiment_1.6.3              DelayedArray_0.2.7                     
[13] matrixStats_0.52.2                      Biobase_2.36.2                         
[15] Rsamtools_1.28.0                        Biostrings_2.44.1                      
[17] XVector_0.16.0                          GenomicRanges_1.28.3                   
[19] GenomeInfoDb_1.12.2                     IRanges_2.10.2                         
[21] S4Vectors_0.14.3                        BiocGenerics_0.22.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11            futile.logger_1.4.3     compiler_3.4.0          R.methodsS3_1.7.1      
 [5] R.utils_2.5.0           futile.options_1.0.0    bitops_1.0-6            tools_3.4.0            
 [9] zlibbioc_1.22.0         testthat_1.0.2          biomaRt_2.32.1          digest_0.6.12          
[13] bit_1.1-12              RSQLite_2.0             memoise_1.1.0           tibble_1.3.3           
[17] lattice_0.20-35         pkgconfig_2.0.1         rlang_0.1.1             Matrix_1.2-10          
[21] DBI_0.7                 yaml_2.1.14             VennDiagram_1.6.17      GenomeInfoDbData_0.99.0
[25] bit64_0.9-7             grid_3.4.0              data.table_1.10.4       R6_2.2.2               
[29] XML_3.98-1.9            BiocParallel_1.10.1     magrittr_1.5            lambda.r_1.1.9         
[33] blob_1.1.0              RCurl_1.95-4.8          crayon_1.3.2            R.oo_1.21.0     
ADD COMMENTlink written 2.1 years ago by Devin King20

Hi,

I would need to be able to reproduce this error in order to troubleshoot. Do you think you can make your BAM file available somewhere? Or a subset of it only as long as it allows me to reproduce the error. Thanks!

H.

ADD REPLYlink written 2.1 years ago by Hervé Pagès ♦♦ 14k

Hi Hervé,

 

Thank you for your response, I apologize I haven't responded sooner. Here is a download link to a small bam file (2.9 Mb) and its associated bam index. The BAM is a sampling of about 15,000 PE reads on chromosome 1 (mapped with BWA to hg19). The index was created with Rsamtools:::indexBam(). The error can be created as such:

bamFile <- 'bwa_sample.bam'
gr <- GRanges('chr1',IRanges(1e6,2e6))
sbp <- ScanBamParam(mapqFilter=10,which=gr)
ga <- readGAlignmentPairs(bamFile,param=sbp)
Error: subscript contains NAs

 

ADD REPLYlink written 2.0 years ago by Devin King20
1

Thanks for providing the file. The problem should be fixed in GenomicAlignments 1.12.2 (release) and 1.13.5 (devel). Both versions should become available via biocLite() in the next 48 hours or so.

Cheers,

H.

ADD REPLYlink written 2.0 years ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 127 users visited in the last hour