Dear all!
For performance reasons I wanted to extract only a limited set of aligned reads from a specified genomic region and thus I set the yieldSize
of the BamFile
to e.g. 5000. scanBam
however ignores that yieldSize
completely and returns all of the reads in the genomic region specified with the argument which of ScanBamParam
. Is this intended?
The example code I run:
> library(Rsamtools) > myBf <- BamFile(bf, index=paste0(bf, ".bai"), yieldSize=5000, asMates=TRUE) > myParam <- ScanBamParam(what=scanBamWhat(), tag="MD", flag=scanBamFlag(isUnmappedQuery=FALSE)) > Test <- scanBam(myBf, param=myParam) > length(Test[[1]]$seq) [1] 10000 ## Running the same with which: > myParam <- ScanBamParam(what=scanBamWhat(), tag="MD", flag=scanBamFlag(isUnmappedQuery=FALSE), which=GRanges("11", ranges=IRanges(69641087, 69654474))) > Test <- scanBam(myBf, param=myParam) > length(Test[[1]]$seq) [1] 146708
Thanks for any input
cheers, jo
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin15.0.0/x86_64 (64-bit) Running under: OS X 10.11.2 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] Rsamtools_1.22.0 Biostrings_2.38.1 XVector_0.10.0 [4] GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 IRanges_2.4.1 [7] S4Vectors_0.8.2 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] zlibbioc_1.16.0 futile.logger_1.4.1 lambda.r_1.1.7 [4] futile.options_1.0.0 BiocParallel_1.4.0 bitops_1.0-6