Question: Read in specific rows of a BAM file that match a certain RNAME
1
3.7 years ago by
K50
United States
K50 wrote:

Hello,

I am trying to read in a BAM file using RSamtools into R. I only want to read in those rows that match a specific "RNAME" (eg:  I only want to read rows where RNAME = "ref2")

I have looked through the vignette and online resources, and this is what I tried so far.

Obviously my "which" argument for ScanBamParam is incorrect. Could someone help me correct this query ?

If there is no way to do this, is my only choice reading in all the rows in BAM file, and then sub-setting based on my RNAME match ?

inputBam = "test.bam" bamFile <- BamFile(inputBam) seqInfo1 <- seqinfo(bamFile) seqInfo1 Seqinfo object with 2 sequences from an unspecified genome:     seqnames seqlengths isCircular genome     ref1     159322         NA   <NA>     ref2     162114         NA   <NA>

ref= "ref2"
# trying to read only rows in BAM file where RNAME = "ref2"
params <- ScanBamParam(which = GRanges(ref,seqlengths = 162114), what = c("qname", "flag","rname","pos","mapq","cigar"))

Error in ScanBamParam(which = GRanges(ref, seqlengths = 162114), what = c("qname",  : error in evaluating the argument 'which' in selecting a method for function 'ScanBamParam': Error in asMethod(object) :  The character vector to convert to a GRanges object must contain strings of the form "chr1:2501-2800" or  "chr1:2501-2800:+" (".." being also supported as a separator between the start and end positions). Strand can be  "+", "-", "*", or missing.

temp <- Rsamtools::scanBam(file =bam.file , param = params)    



Thank you.

bam scanbamparam • 766 views
modified 3.7 years ago by Martin Morgan ♦♦ 23k • written 3.7 years ago by K50

I would like to add that I am dealing with non-human genomes. So I don't have an IRanges to supply in the "which" condition as mentioned in here: filterBam using rname

Answer: Read in specific rows of a BAM file that match a certain RNAME
2
3.7 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

For a sample file I ran

library(Rsamtools)
example(scanBam)
bfl = BamFile(fl)

Grab the seqinfo, subset to the sequence(s) that you're interested in, and coerce to a GRanges

which <- as(seqinfo(bfl)["seq2"], "GRanges")

scanBam(bfl, param=ScanBamParam(which=which)

Usually it's more convenient to read the data at a slightly higher level, via one of the GenomicAlignments::readGAlignment*() functions.

The error you see is because of the way you're trying to construct a GRanges, which might instead be

which <- GRanges("ref2", IRanges(1, 162114))

and then as above. Usually when you see an error it's helpful to make the function you're invoking less 'nested', to figure out which part is causing problems. Using this strategy might have lead to

> GRanges(ref,seqlengths = 162114)
Error in asMethod(object) :
The character vector to convert to a GRanges object must contain
strings of the form "chr1:2501-2800" or "chr1:2501-2800:+" (".." being
also supported as a separator between the start and end positions).
Strand can be "+", "-", "*", or missing.

So that you'd know where to look.

Thank you for the response.

Coercing the information into GRanges worked.

which <- as(seqinfo(bfl)["seq2"], "GRanges")