Question: Reading alignments above a mapq threshold from BAM files
1
gravatar for Johannes Rainer
4.4 years ago by
Johannes Rainer1.5k
Italy
Johannes Rainer1.5k wrote:

dear All!

I was wondering whether there is a possibility to read only alignments from a BAM file passing a certain mapq threshold. I searched for it but didn't find anything obvious. Currently I'm using the filterBam function to save only the high quality alignments to another BAM file, but that is quite time and disk-space consuming.

If there is no easy way, might that be something that could be added to the ScanBamParam? Something like a mapping quality filter?

cheers, jo

 

rsamtools bam • 1.3k views
ADD COMMENTlink modified 4.4 years ago by Martin Morgan ♦♦ 24k • written 4.4 years ago by Johannes Rainer1.5k
Answer: Reading alignments above a mapq threshold from BAM files
2
gravatar for Hervé Pagès
4.4 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Johannes,

There might be some technical reasons for keeping the filter and param arguments separated but I'll leave this to the Rsamtools experts. In the meantime here is a modified version of scanBam() that supports the filter argument (like filterBam() does):

scanBam2 <- function(file, index=file, ...,
                           param=ScanBamParam(what=scanBamWhat()),
                           filter=FilterRules())
{
    res <- scanBam(file, index=index, ..., param=param)
    if (length(filter) == 0L) 
        return(res)
    lapply(res, function(x) {
        df <- DataFrame(x)
        df <- df[eval(filter, df), ]
        as.list(df)
    })
}

Used together with an appropriate yieldSize, it should perform a little better than the filterBam solution (at least it's no more disk-space consuming :-) ).

The same addition could be made to the readGAlignment*() functions.

Hope this helps,

H.

 

ADD COMMENTlink written 4.4 years ago by Hervé Pagès ♦♦ 14k

Thanks Herve!

ADD REPLYlink written 4.4 years ago by Johannes Rainer1.5k
Answer: Reading alignments above a mapq threshold from BAM files
2
gravatar for Martin Morgan
4.4 years ago by
Martin Morgan ♦♦ 24k
United States
Martin Morgan ♦♦ 24k wrote:

I added mapqFilter to ScanBamParam in the devel branch, Rsamtools version 1.21.12. A more consistent approach to filtering on input is required; these are relatively easy to implement but a bit tedious with the current design.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Martin Morgan ♦♦ 24k

The current development version of Rsamtools is 1.21.11; did you mean 1.21.12 in your post?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Aaron Lun25k

yep, thanks!

ADD REPLYlink written 4.4 years ago by Martin Morgan ♦♦ 24k

Thanks Martin!

ADD REPLYlink written 4.4 years ago by Johannes Rainer1.5k

Sorry! It was my mistake... I used params instead of param... with param it works.

cheers, jo

 

Dear Martin!

I get now however a strange error message when I try the mapqFilter in the ScanBamParam submitted to the summarizeOverlaps method. The code I used was:

sbp <- ScanBamParam(mapqFilter=mapq.cut)
bfiles <- BamFileList(BamFiles, index=paste0(BamFiles, ".bai"),
                      obeyQname=TRUE,
                      asMates=TRUE, yieldSize=1e+6)
GeneCounts <- summarizeOverlaps(features=EnsGenes , reads=bfiles[[1]],
                                mode="IntersectionStrict",
                                ignore.strand=TRUE, inter.feature=TRUE, 
                                singleEnd=FALSE, fragments=TRUE, params=sbp)

I get then the following error message:

Error in SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features,  :
  error in evaluating the argument 'assays' in selecting a method for function 'SummarizedExperiment': Error in validObject(.Object) :
  invalid class "SimpleList" object: invalid object for slot "listData" in class "SimpleList": got class "matrix", should be or extend class "list"

When I run the same code without the params everything is OK. I'll try to dig into the source code, but maybe you have a quick hint what might be wrong here (I could guess that eventually the mapq is not read?).

Thanks and best regards,

jo

 

my sessionInfo:

sessionInfo()
R Under development (unstable) (2015-06-17 r68530)
Platform: x86_64-apple-darwin14.4.0/x86_64 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2         ascii_2.1                 
 [3] Rsubread_1.19.2            EnsDb.Hsapiens.v75_0.99.12
 [5] ensembldb_1.1.5            GenomicFeatures_1.21.13   
 [7] AnnotationDbi_1.31.17      SeqUtils_0.0.3            
 [9] ShortRead_1.27.4           GenomicAlignments_1.5.11  
[11] SummarizedExperiment_0.3.1 Rsamtools_1.21.12         
[13] GenomicRanges_1.21.16      GenomeInfoDb_1.5.8        
[15] Biostrings_2.37.2          XVector_0.9.1             
[17] IRanges_2.3.13             S4Vectors_0.7.9           
[19] BiocParallel_1.3.29        Biobase_2.29.1            
[21] BiocGenerics_0.15.3       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6                  compiler_3.3.0              
 [3] BiocInstaller_1.19.6         futile.logger_1.4.1         
 [5] AnnotationHub_2.1.28         bitops_1.0-6                
 [7] futile.options_1.0.0         tools_3.3.0                 
 [9] zlibbioc_1.15.0              digest_0.6.8                
[11] biomaRt_2.25.1               RSQLite_1.0.0               
[13] lattice_0.20-31              shiny_0.12.1                
[15] DBI_0.3.1                    rtracklayer_1.29.11         
[17] hwriter_1.3.2                httr_1.0.0                  
[19] stringr_1.0.0                grid_3.3.0                  
[21] R6_2.0.1                     XML_3.98-1.3                
[23] latticeExtra_0.6-26          GenomePlotR_0.1.5           
[25] lambda.r_1.1.7               magrittr_1.5                
[27] htmltools_0.2.6              xtable_1.7-4                
[29] mime_0.3                     interactiveDisplayBase_1.7.0
[31] httpuv_1.3.2                 stringi_0.5-5               
[33] RCurl_1.95-4.7      

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Johannes Rainer1.5k

I wasn't able to reproduce this. SummarizedExperiment and S4Vectors might be out of date (BiocInstaller::biocValid() for hints). traceback() might help.

ADD REPLYlink written 4.4 years ago by Martin Morgan ♦♦ 24k

Sorry, that was all my mistake. I eventually edited the question too late, but I did use params=sbp instead of param=sbp in the function call following the summarizeOverlaps vignette of the GenomicAlignments package, section 5.2 stating "The params argument...". Calling the function with param=sbp everything works nicely.

Thanks
 

ADD REPLYlink written 4.4 years ago by Johannes Rainer1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 326 users visited in the last hour