Possible issue with Rsamtools ScanBamParam mapqfilter
0
0
Entering edit mode
Devin King ▴ 20
@devin-king-13481
Last seen 6.0 years ago

I'm having an issue importing (part of) a BAM file into R using the Rsamtools readGAlignmentPairs() function. In particular, I raise a peculiar error ("subscript contains NAs") when using a ScanBamParam object with bamMapqFilter set to not NA and > 0.  This only happens with some BAM files, so it may be a problem with the particular BAM file and not the Rsamtools code. I apologize for any mistakes, this is my first post. Any help with the following issue would be greatly appreciated. For example:

gr <- GRanges('chr1',IRanges(1e6,2e6))
sbp <- ScanBamParam(mapqFilter=NA,which=gr)
ga <- readGAlignmentPairs(bamFile,param=sbp)

works without issue. However,

gr <- GRanges('chr1',IRanges(1e6,2e6))
sbp <- ScanBamParam(mapqFilter=10,which=gr)
ga <- readGAlignmentPairs(bamFile,param=sbp)
Error: subscript contains NAs

produces the error. Inspection suggests there are about 10000 reads in this region that have mapq > 10, so I'm not sure why an error is being raised. Here is the traceback()

14: stop(wmsg(...), call. = FALSE)
13: .subscript_error("subscript contains NAs")
12: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
        allow.NAs = allow.NAs)
11: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
        allow.NAs = allow.NAs)
10: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
9: extractROWS(x, i)
8: extractROWS(x, i)
7: gal[idx2]
6: gal[idx2]
5: .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, 
       use.mcols = use.mcols)
4: readGAlignmentPairs(bam, character(0), use.names = use.names, 
       param = param, with.which_label = with.which_label, strandMode = strandMode)
3: readGAlignmentPairs(bam, character(0), use.names = use.names, 
       param = param, with.which_label = with.which_label, strandMode = strandMode)
2: readGAlignmentPairs(bamFile, param = sbp)
1: readGAlignmentPairs(bamFile, param = sbp)

And my sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /software/free/r/R-3.4.0/lib/R/lib/libRblas.so
LAPACK: /software/free/r/R-3.4.0/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C         LC_MONETARY=C       
 [6] LC_MESSAGES=C        LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C         LC_TELEPHONE=C      
[11] LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.4.1                      bedr_1.0.3                             
 [3] pbapply_1.3-3                           BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [5] BSgenome_1.44.0                         rtracklayer_1.36.3                     
 [7] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.4                 
 [9] AnnotationDbi_1.38.1                    GenomicAlignments_1.12.1               
[11] SummarizedExperiment_1.6.3              DelayedArray_0.2.7                     
[13] matrixStats_0.52.2                      Biobase_2.36.2                         
[15] Rsamtools_1.28.0                        Biostrings_2.44.1                      
[17] XVector_0.16.0                          GenomicRanges_1.28.3                   
[19] GenomeInfoDb_1.12.2                     IRanges_2.10.2                         
[21] S4Vectors_0.14.3                        BiocGenerics_0.22.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11            futile.logger_1.4.3     compiler_3.4.0          R.methodsS3_1.7.1      
 [5] R.utils_2.5.0           futile.options_1.0.0    bitops_1.0-6            tools_3.4.0            
 [9] zlibbioc_1.22.0         testthat_1.0.2          biomaRt_2.32.1          digest_0.6.12          
[13] bit_1.1-12              RSQLite_2.0             memoise_1.1.0           tibble_1.3.3           
[17] lattice_0.20-35         pkgconfig_2.0.1         rlang_0.1.1             Matrix_1.2-10          
[21] DBI_0.7                 yaml_2.1.14             VennDiagram_1.6.17      GenomeInfoDbData_0.99.0
[25] bit64_0.9-7             grid_3.4.0              data.table_1.10.4       R6_2.2.2               
[29] XML_3.98-1.9            BiocParallel_1.10.1     magrittr_1.5            lambda.r_1.1.9         
[33] blob_1.1.0              RCurl_1.95-4.8          crayon_1.3.2            R.oo_1.21.0     
rsamtools scanbamparam mapqfilter • 1.4k views
ADD COMMENT
0
Entering edit mode

Hi,

I would need to be able to reproduce this error in order to troubleshoot. Do you think you can make your BAM file available somewhere? Or a subset of it only as long as it allows me to reproduce the error. Thanks!

H.

ADD REPLY
0
Entering edit mode

Hi Hervé,

 

Thank you for your response, I apologize I haven't responded sooner. Here is a download link to a small bam file (2.9 Mb) and its associated bam index. The BAM is a sampling of about 15,000 PE reads on chromosome 1 (mapped with BWA to hg19). The index was created with Rsamtools:::indexBam(). The error can be created as such:

bamFile <- 'bwa_sample.bam'
gr <- GRanges('chr1',IRanges(1e6,2e6))
sbp <- ScanBamParam(mapqFilter=10,which=gr)
ga <- readGAlignmentPairs(bamFile,param=sbp)
Error: subscript contains NAs

 

ADD REPLY
1
Entering edit mode

Thanks for providing the file. The problem should be fixed in GenomicAlignments 1.12.2 (release) and 1.13.5 (devel). Both versions should become available via biocLite() in the next 48 hours or so.

Cheers,

H.

ADD REPLY

Login before adding your answer.

Traffic: 769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6