Rsamtools filterBam question
0
1
Entering edit mode
Vivek.b ▴ 100
@vivekb-7661
Last seen 3.9 years ago
Germany

Hi Everyone

I have two questions related to Rsamtools :

1. I am trying to filter my bam file by scanning the genome in 1kb bins, and removing reads that match certain criteria. But I will have to bin the genome chromosome by chromosome and run the filtering per chromosome..

I created the following example using the example BAM file , although it's a bit buggy due to NA's.

Suppose I want to only keep reads with highest mapQ per bin.

 

library(Rsamtools)

bamFile <- system.file("extdata", "ex1.bam", package="Rsamtools")

sparam <- ScanBamParam(what = c("qname", "flag", "rname", "pos", "mapq"),
                       flag = scanBamFlag(isFirstMateRead = TRUE, isUnmappedQuery = FALSE) )

bam <- scanBam(bamFile, param = sparam)[[1]]

bam <- as.data.frame(bam)

filterfunc <- function(bam) {
        # remove NA positions
        bam <- na.omit(bam)        
        # split df by chromosome
        chroms <- as.character(unique(bam$rname))
        bam2 <- split(bam, bam$rname)
        
     
        # for each chrom, filter highest mapq in bins
        filterHiQ <- function(bam){
                chromEnd <- round(max(bam$pos), -2)
                end <- ifelse(chromEnd < max(bam$pos), chromEnd + 100, chromEnd)
                bins <- .bincode(bam$pos, seq(0,end, 100))
                
                
                tokeep <- lapply(split(bam$mapq, bins), function(x) {
                        keep <- x == max(x)
                        return(keep)
                })
                
                return(unlist(tokeep))
        }
        
        # get filtered bam per chrom
        keep <- unlist(lapply(bam2, filterHiQ))
        return(keep)
        
}

This function works on bamFile that I read (returns logical vector), but when I try to create the FilterRule using this function, this returns errors due to NAs removed.

filterBam(bamFile, "/tmp/test.bam" , filter = FilterRules(list(filterfunc)))

Error in seq.default(0, end, 100) : 'to' cannot be NA, NaN or infinite

 

I got this problem while trying to create an example. However, my question was different.! I wanted to know if there is a way to apply a function (like the filterHiQ i wrote above) under filterRules in such a way that It does it chromosome by chromosome, so I don't have to split the file internally?

 

2. My second question was related to the ScanBamParam above. I wanted to actually include R1 reads + R2 reads in case R1 is unmapped. So something like isFirstMateRead = TRUE |  hasUnmappedMate = TRUE, but can't find how to do that..

 

Thanks for your help.

Vivek

P.S. (29th Aug 2016) I just edited the example for clarity. this works if I read the bam file as data frame and run the command. But doesn't work as filterRules

 

rsamtools • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6