Rsamtools isSupplementary argument not working
1
0
Entering edit mode
galeoni • 0
@galeoni-17896
Last seen 6.2 years ago

Hi to everyone,

 I'm trying to use the Rsamtools (latest version: 1.33.5) package in order to filter some .bam files.

I'm facing some problems with the isSupplementary  argument that should recognize reads flagged with the 2048 samflag within the bam file (https://broadinstitute.github.io/picard/explain-flags.html). In particular, I've found discrepancies between the outputs of countBam() and the outputs of samtools view (from bash) that should be equals instead.

For instance, Rsamtools countBam() is returning 14036855 supplementary reads:

> param = ScanBamParam(flag = scanBamFlag(isSupplementaryAlignment = T), what = c("qname","pos","qwidth","rname"))

> countBam(input.bam, param = param)

space start end width                    file  records nucleotides
1    NA    NA  NA    NA input.bam 14036855  3423511754

while samtools view only 146529:

$ samtools view -f 2048 -c input.bam
146529

 

Moreover, the total reads counts with samtools is 14036855, the very same ouput of countBam:

$ samtools view -c input.bam
14036855

 

I did not found discrepancies in the counts using different arguments (like isUnmappedQuery), so I think that since the isSupplementary argument is a recent implementation, could it be that there is still a bug? 

Here below the output from sessioninfo():

 

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] hwriter_1.3.2        Rsamtools_1.33.5     Biostrings_2.48.0    XVector_0.20.0       GenomicRanges_1.32.7 GenomeInfoDb_1.16.0 
[7] IRanges_2.14.12      S4Vectors_0.18.3     BiocGenerics_0.26.0 

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1              Rcpp_0.12.19                lattice_0.20-35             digest_0.6.18              
 [5] plyr_1.8.4                  backports_1.1.2             acepack_1.4.1               RSQLite_2.1.1              
 [9] ggplot2_3.0.0               pillar_1.3.0                zlibbioc_1.26.0             rlang_0.2.2                
[13] lazyeval_0.2.1              rstudioapi_0.8              data.table_1.11.8           annotate_1.58.0            
[17] blob_1.1.1                  rpart_4.1-13                Matrix_1.2-14               checkmate_1.8.5            
[21] splines_3.5.1               BiocParallel_1.14.2         geneplotter_1.58.0          stringr_1.3.1              
[25] foreign_0.8-71              htmlwidgets_1.3             RCurl_1.95-4.11             bit_1.1-14                 
[29] munsell_0.5.0               DelayedArray_0.6.6          compiler_3.5.1              base64enc_0.1-3            
[33] htmltools_0.3.6             nnet_7.3-12                 SummarizedExperiment_1.10.1 tibble_1.4.2               
[37] gridExtra_2.3               htmlTable_1.12              GenomeInfoDbData_1.1.0      Hmisc_4.1-1                
[41] matrixStats_0.54.0          XML_3.98-1.16               crayon_1.3.4                bitops_1.0-6               
[45] grid_3.5.1                  xtable_1.8-3                gtable_0.2.0                DBI_1.0.0                  
[49] magrittr_1.5                scales_1.0.0                stringi_1.2.4               genefilter_1.62.0          
[53] latticeExtra_0.6-28         Formula_1.2-3               RColorBrewer_1.1-2          tools_3.5.1                
[57] bit64_0.9-7                 Biobase_2.40.0              DESeq2_1.20.0               survival_2.42-6            
[61] AnnotationDbi_1.42.1        colorspace_1.3-2            cluster_2.0.7-1             memoise_1.1.0              
[65] knitr_1.20

Thanks in advance for any suggestions and/or answers,

Gleoni

rsamtools isSupplementary scanbamparam scanbamflag countbam • 1.3k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

Thank you for the report, can you try version 1.33.7, available from

BiocManager::install("Bioconductor/Rsamtools")

now or via BiocManager::install("Rsamtools") (in devel) tomorrow afternoon?

ADD COMMENT
0
Entering edit mode

Thanks for the answer Martin and sorry for the delay in the reply,

I've tried the new version as you were suggesting (v. 1.33.7) but the discrepancy in the counts is still present. As previously reported, the isSupplementary argument seems to not affect the counts of the output. 

Thanks again for your support, 

Galeoni

ADD REPLY
0
Entering edit mode

Can you provide me with a (subset) of your bam file? You can contact me at Martin.Morgan at RoswellPark.org

ADD REPLY
0
Entering edit mode

Hi Martin,

I've just found that with a fresh restart of the system, the v. 1.33.7 worked perfectly! 

Thanks again for the support,

Galeoni

 

ADD REPLY

Login before adding your answer.

Traffic: 286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6