Hi to everyone,
I'm trying to use the Rsamtools (latest version: 1.33.5) package in order to filter some .bam files.
I'm facing some problems with the isSupplementary argument that should recognize reads flagged with the 2048 samflag within the bam file (https://broadinstitute.github.io/picard/explain-flags.html). In particular, I've found discrepancies between the outputs of countBam() and the outputs of samtools view (from bash) that should be equals instead.
For instance, Rsamtools countBam() is returning 14036855 supplementary reads:
> param = ScanBamParam(flag = scanBamFlag(isSupplementaryAlignment = T), what = c("qname","pos","qwidth","rname")) > countBam(input.bam, param = param) space start end width file records nucleotides 1 NA NA NA NA input.bam 14036855 3423511754
while samtools view only 146529:
$ samtools view -f 2048 -c input.bam
146529
Moreover, the total reads counts with samtools is 14036855, the very same ouput of countBam:
$ samtools view -c input.bam
14036855
I did not found discrepancies in the counts using different arguments (like isUnmappedQuery), so I think that since the isSupplementary argument is a recent implementation, could it be that there is still a bug?
Here below the output from sessioninfo():
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 LC_COLLATE=it_IT.UTF-8
[5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8 LC_PAPER=it_IT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] hwriter_1.3.2 Rsamtools_1.33.5 Biostrings_2.48.0 XVector_0.20.0 GenomicRanges_1.32.7 GenomeInfoDb_1.16.0
[7] IRanges_2.14.12 S4Vectors_0.18.3 BiocGenerics_0.26.0
loaded via a namespace (and not attached):
[1] locfit_1.5-9.1 Rcpp_0.12.19 lattice_0.20-35 digest_0.6.18
[5] plyr_1.8.4 backports_1.1.2 acepack_1.4.1 RSQLite_2.1.1
[9] ggplot2_3.0.0 pillar_1.3.0 zlibbioc_1.26.0 rlang_0.2.2
[13] lazyeval_0.2.1 rstudioapi_0.8 data.table_1.11.8 annotate_1.58.0
[17] blob_1.1.1 rpart_4.1-13 Matrix_1.2-14 checkmate_1.8.5
[21] splines_3.5.1 BiocParallel_1.14.2 geneplotter_1.58.0 stringr_1.3.1
[25] foreign_0.8-71 htmlwidgets_1.3 RCurl_1.95-4.11 bit_1.1-14
[29] munsell_0.5.0 DelayedArray_0.6.6 compiler_3.5.1 base64enc_0.1-3
[33] htmltools_0.3.6 nnet_7.3-12 SummarizedExperiment_1.10.1 tibble_1.4.2
[37] gridExtra_2.3 htmlTable_1.12 GenomeInfoDbData_1.1.0 Hmisc_4.1-1
[41] matrixStats_0.54.0 XML_3.98-1.16 crayon_1.3.4 bitops_1.0-6
[45] grid_3.5.1 xtable_1.8-3 gtable_0.2.0 DBI_1.0.0
[49] magrittr_1.5 scales_1.0.0 stringi_1.2.4 genefilter_1.62.0
[53] latticeExtra_0.6-28 Formula_1.2-3 RColorBrewer_1.1-2 tools_3.5.1
[57] bit64_0.9-7 Biobase_2.40.0 DESeq2_1.20.0 survival_2.42-6
[61] AnnotationDbi_1.42.1 colorspace_1.3-2 cluster_2.0.7-1 memoise_1.1.0
[65] knitr_1.20
Thanks in advance for any suggestions and/or answers,
Gleoni
Thanks for the answer Martin and sorry for the delay in the reply,
I've tried the new version as you were suggesting (v. 1.33.7) but the discrepancy in the counts is still present. As previously reported, the isSupplementary argument seems to not affect the counts of the output.
Thanks again for your support,
Galeoni
Can you provide me with a (subset) of your bam file? You can contact me at Martin.Morgan at RoswellPark.org
Hi Martin,
I've just found that with a fresh restart of the system, the v. 1.33.7 worked perfectly!
Thanks again for the support,
Galeoni