Question: scanBam with ScanBamParam argument which specified ignores yieldSize. Intentionally?
0
gravatar for Johannes Rainer
3.8 years ago by
Johannes Rainer1.5k
Italy
Johannes Rainer1.5k wrote:

Dear all!

For performance reasons I wanted to extract only a limited set of aligned reads from a specified genomic region and thus I set the yieldSize of the BamFile to e.g. 5000. scanBam however ignores that yieldSize completely and returns all of the reads in the genomic region specified with the argument which of ScanBamParam. Is this intended?

The example code I run:

> library(Rsamtools)
> myBf <- BamFile(bf, index=paste0(bf, ".bai"), yieldSize=5000, asMates=TRUE)
> myParam <- ScanBamParam(what=scanBamWhat(), tag="MD", flag=scanBamFlag(isUnmappedQuery=FALSE))
> Test <- scanBam(myBf, param=myParam)
> length(Test[[1]]$seq)
[1] 10000

## Running the same with which:
> myParam <- ScanBamParam(what=scanBamWhat(), tag="MD", flag=scanBamFlag(isUnmappedQuery=FALSE), which=GRanges("11", ranges=IRanges(69641087, 69654474)))
> Test <- scanBam(myBf, param=myParam)
> length(Test[[1]]$seq)
[1] 146708

 

Thanks for any input

cheers, jo

 

 

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin15.0.0/x86_64 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] Rsamtools_1.22.0     Biostrings_2.38.1    XVector_0.10.0      
[4] GenomicRanges_1.22.1 GenomeInfoDb_1.6.1   IRanges_2.4.1       
[7] S4Vectors_0.8.2      BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0      futile.logger_1.4.1  lambda.r_1.1.7      
[4] futile.options_1.0.0 BiocParallel_1.4.0   bitops_1.0-6        

 

ADD COMMENTlink modified 3.8 years ago by Martin Morgan ♦♦ 23k • written 3.8 years ago by Johannes Rainer1.5k
Answer: scanBam with ScanBamParam argument which specified ignores yieldSize. Intentiona
1
gravatar for Martin Morgan
3.8 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

From ?BamFile

     yieldSize: Number of records to yield each time the file is read
          from using 'scanBam' or, when 'length(bamWhich()) != 0', a
          threshold which yields records in complete ranges whose sum
          first exceeds 'yieldSize'.

So yes, the behavior you see is intentional.  For many ranges, it is often more efficient to iterate through the entire file and subsetByOverlaps(), e.g., using GenomicFiles::reduceByYield() and readGAlignments().

ADD COMMENTlink written 3.8 years ago by Martin Morgan ♦♦ 23k
Answer: scanBam with ScanBamParam argument which specified ignores yieldSize. Intentiona
0
gravatar for TimothéeFlutre
3.8 years ago by
France
TimothéeFlutre70 wrote:

From what I understand, yieldSize=5000 doesn't mean "read only the first 5000 records" but "read all records by batch, with 5000 records per batch". It helps when loading all records in memory at once would be too big.

ADD COMMENTlink written 3.8 years ago by TimothéeFlutre70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 218 users visited in the last hour