summarizeOverlaps with mapq filter?
2
1
Entering edit mode
@sang-chul-choi-6026
Last seen 9.6 years ago
Hi, I wonder if I could use "summarizeOverlaps" to count reads with mapping quality (or MAPQ) larger than some value, say, 30. I took the example script in the function's help to specify what function signature I want to use. I used to use "readBamGappedAlignments" to read a BAM file, and filter out reads with MAPQ less than a specified value, and use the filtered GappedAlignments object to count reads mapped on genomic features. But, I like the feature of summarizeOverlaps that I can use to read a list of BAM files. But, I do not know use the summarizeOverlaps function to use only reads with MAPQ larger than some specified value. If this is not possible currently, then I might have to create another BAM file of reads with MAPQ larger than some value and use the BAM file to count overlaps using summarizeOverlaps function. I do not like this two-step procedure. Thank you for your help! SangChul --------------------------------- library(Rsamtools) fls <- list.files(system.file("extdata",package="GenomicRanges"), recursive=TRUE, pattern="*bam$", full=TRUE) names(fls) <- basename(fls) bf <- BamFileList(fls, index=character()) features <- GRanges( seqnames = c(rep("chr2L", 4), rep("chr2R", 5), rep("chr3L", 2)), ranges = IRanges(c(1000, 3000, 4000, 7000, 2000, 3000, 3600, 4000, 7500, 5000, 5400), width=c(rep(500, 3), 600, 900, 500, 300, 900, 300, 500, 500)), "-", group_id=c(rep("A", 4), rep("B", 5), rep("C", 2))) se <- summarizeOverlaps(features, bf) --------------------------------- =============================== Sang Chul Choi Ph.D. Research Associate Sang Chul Choi (222 WRRB IAB) 902 Koyukuk Dr (PO Box 757000) 311 Irving I, University of Alaska Fairbanks Fairbanks, AK 99775-7000 Office +1-907-474-5071 Fax +1-907-474-6967 Google Phone +1-607-542-9362 ===============================
• 1.8k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi SangChul, Unfortuantely ScanBamParam() does not support filtering on specific values of the sam fields. You can subset on the TRUE/FALSE flags, such as the QC flag, ScanBamParam(flag=scanBamFlag(isNotPassingQualityControls=FALSE)) but it sounds like you really want to filter on the map quality. You could write a little a helper function to filter and count the bams. filterAndCountBam <- function(bf, features, mode, mapq_cutoff) { gal <- readGAlignments(bf, param=ScanBamParam(what="mapq")) reads <- gal[mcols(gal)$mapq > mapq_cutoff] summarizeOverlaps(features, reads, mode) } lapply'ing over the list of bam files returns a list of SummarizedExperiments. gr <- GRanges(....) lst <- lapply(bamFileList, filterAndCountBam, features=gr, mode=Union, mapq_cutoff=30) Then cbind the SummarizedExperiments together to combine counts. do.call(cbind, lst) Valerie On 07/03/2013 06:24 PM, Sang Chul Choi wrote: > Hi, > > I wonder if I could use "summarizeOverlaps" to count reads with mapping quality (or MAPQ) larger than some value, say, 30. I took the example script in the function's help to specify what function signature I want to use. I used to use "readBamGappedAlignments" to read a BAM file, and filter out reads with MAPQ less than a specified value, and use the filtered GappedAlignments object to count reads mapped on genomic features. But, I like the feature of summarizeOverlaps that I can use to read a list of BAM files. But, I do not know use the summarizeOverlaps function to use only reads with MAPQ larger than some specified value. > > If this is not possible currently, then I might have to create another BAM file of reads with MAPQ larger than some value and use the BAM file to count overlaps using summarizeOverlaps function. I do not like this two-step procedure. > > Thank you for your help! > > SangChul > > --------------------------------- > library(Rsamtools) > > fls <- list.files(system.file("extdata",package="GenomicRanges"), > recursive=TRUE, pattern="*bam$", full=TRUE) > names(fls) <- basename(fls) > bf <- BamFileList(fls, index=character()) > features <- GRanges( > seqnames = c(rep("chr2L", 4), rep("chr2R", 5), rep("chr3L", 2)), > ranges = IRanges(c(1000, 3000, 4000, 7000, 2000, 3000, 3600, > 4000, 7500, 5000, 5400), > width=c(rep(500, 3), 600, 900, 500, 300, 900, > 300, 500, 500)), "-", > group_id=c(rep("A", 4), rep("B", 5), rep("C", 2))) > > se <- summarizeOverlaps(features, bf) > --------------------------------- > > =============================== > Sang Chul Choi Ph.D. > Research Associate > > Sang Chul Choi (222 WRRB IAB) > 902 Koyukuk Dr (PO Box 757000) > 311 Irving I, University of Alaska Fairbanks > Fairbanks, AK 99775-7000 > > Office +1-907-474-5071 > Fax +1-907-474-6967 > Google Phone +1-607-542-9362 > =============================== > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode

I agree with SangChul, it would be a very nice feature if only alignments passing a certain mapq threshold could be read from a BAM file, ideally by specifying such a filter in the ScanBamParam...

I was just wondering... there is still no possibility out there? I searched for it but might have overlooked it... The two-step approach (filtering the original BAM file and saving it to a new BAM file) is time and disk-space consuming...

cheers, jo

ADD REPLY
0
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 22 days ago
Italy

In case you still need it, Martin has included an appropriate parameter to the ScanBamParam. See post: Reading alignments above a mapq threshold from BAM files

cheers, jo

ADD COMMENT

Login before adding your answer.

Traffic: 694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6