Rsamtools filterBam question
0
1
Entering edit mode
Vivek.b ▴ 100
@vivekb-7661
Last seen 2.2 years ago
Germany

Hi Everyone

I have two questions related to Rsamtools :

1. I am trying to filter my bam file by scanning the genome in 1kb bins, and removing reads that match certain criteria. But I will have to bin the genome chromosome by chromosome and run the filtering per chromosome..

I created the following example using the example BAM file , although it's a bit buggy due to NA's.

Suppose I want to only keep reads with highest mapQ per bin.

library(Rsamtools)

bamFile <- system.file("extdata", "ex1.bam", package="Rsamtools")

sparam <- ScanBamParam(what = c("qname", "flag", "rname", "pos", "mapq"),
flag = scanBamFlag(isFirstMateRead = TRUE, isUnmappedQuery = FALSE) )

bam <- scanBam(bamFile, param = sparam)[[1]]

bam <- as.data.frame(bam)

filterfunc <- function(bam) {
# remove NA positions
bam <- na.omit(bam)
# split df by chromosome
chroms <- as.character(unique(bam$rname)) bam2 <- split(bam, bam$rname)

# for each chrom, filter highest mapq in bins
filterHiQ <- function(bam){
chromEnd <- round(max(bam$pos), -2) end <- ifelse(chromEnd < max(bam$pos), chromEnd + 100, chromEnd)
bins <- .bincode(bam$pos, seq(0,end, 100)) tokeep <- lapply(split(bam$mapq, bins), function(x) {
keep <- x == max(x)
return(keep)
})

return(unlist(tokeep))
}

# get filtered bam per chrom
keep <- unlist(lapply(bam2, filterHiQ))
return(keep)

}

This function works on bamFile that I read (returns logical vector), but when I try to create the FilterRule using this function, this returns errors due to NAs removed.

filterBam(bamFile, "/tmp/test.bam" , filter = FilterRules(list(filterfunc)))

Error in seq.default(0, end, 100) : 'to' cannot be NA, NaN or infinite

I got this problem while trying to create an example. However, my question was different.! I wanted to know if there is a way to apply a function (like the filterHiQ i wrote above) under filterRules in such a way that It does it chromosome by chromosome, so I don't have to split the file internally?

2. My second question was related to the ScanBamParam above. I wanted to actually include R1 reads + R2 reads in case R1 is unmapped. So something like isFirstMateRead = TRUE |  hasUnmappedMate = TRUE, but can't find how to do that..

Vivek

P.S. (29th Aug 2016) I just edited the example for clarity. this works if I read the bam file as data frame and run the command. But doesn't work as filterRules

rsamtools • 863 views