summarizeOverlaps vs countOverlaps
1
0
Entering edit mode
swaraj basu ▴ 50
@swaraj-basu-4629
Last seen 7.1 years ago
Dear All, I am getting completely different results for the count of mapped reads on transcript features using the countOverlaps and summarizeOverlaps. CountOverlaps on using a gappedAlignment and Granges object gives slightly different results. SummarizeOverlaps gives totally different results for the modes "Union" and "intersectionNotEmpty". Can someone please let me know where I am making a mistake. I have attached a screenshot of the transcripts. All the transcripts belong to the same gene. Following steps were performed by me on a small subset of my data 1. I built a gapped alignment object of mapped reads using the command reads=readBamGappedAlignments("path / to / bam/sorted.bam") 2. From the gapped alignment I built a genomic ranges object ranges=GRanges(seqnames=rname(reads), ranges=IRanges(start=start(reads), end=end(reads)),strand=rep("*",length(reads))) 3. I built a genomic ranges list from a custom genedb object (4 transcripts). geneE=exonsBy(genedb,'gene') 4. Used the count and summarize overlaps functions a). count1=countOverlaps(geneE, reads) b). count2=countOverlaps(geneE, ranges) c). count3=assays(summarizeOverlaps(geneE, reads, mode="UNION"))$counts[,1] c). count4=assays(summarizeOverlaps(geneE, reads, mode="intersectionNotEmpty"))$counts[,1] 5. I get four different count results "T1, T2, T3, T4" a). count1 "55972, 41029, 18270, 18734" b). count2 "55989, 41046, 18287, 18751" c). count3 "0, 2, 0, 1092" d). count4 "0, 3712, 0, 2550" -- Swaraj Basu PhD Student (Bioinformatics - Functional Genomics) Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Naples -------------- next part -------------- A non-text attachment was scrubbed... Name: transcripts.png Type: image/png Size: 12062 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120229="" 3e4c18b1="" attachment.png="">
Alignment Alignment • 2.0k views
0
Entering edit mode
@valerie-obenchain-4275
Last seen 23 months ago
United States
Hello, You should expect different results from countOverlaps and summarizeOverlaps because they count in different ways. The different modes of summarizeOverlaps counting are explained in the vignette, "Overview of summarizeOverlaps", found by browseVignettes("GenomicRanges") or at http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html summarizeOverlaps was developed to provide count methods that do not "double count" a read that hit more than one feature (where a feature is an exon, transcript etc.). summarizeOverlaps also treats a GRanges and GRangesList differently. This is explained in the man page and in the vignette. Once you've had a chance to read the documentation let me know if this is not clear and I'd be happy to explain with examples. Note that the GappedAlignments container is gap-aware. When a GappedAlignment is coerced to a GRangesList the portions of the gapped read become distinct rows in the new GRanges obejct. When a GappedAlignment is coerced to a GRanges the gaps are collapsed and the range is continuous. In other words we are no longer aware of our gaps. ## Here we create the three objects with the same reads, galn <- GappedAlignments( names = c("a","b","c"), rname = Rle(rep("chr1", 3)), pos = as.integer(c(1400, 3100, 5200)), cigar = c("500M", "50M200N50M", "50M150N50M"), strand = strand(rep("+", 3))) gr <- as(galn, "GRanges") grl <- as(galn, "GRangesList") ## Create a subject for overlap testing that falls in the gap of the second range > subj <- GRanges("chr1", IRanges(3200, 3300)) > subj GRanges with 1 range and 0 elementMetadata cols: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [3200, 3300] * --- seqlengths: chr1 NA ## Investigate how this range is counted > countOverlaps(galn, subj) a b c 0 0 0 > countOverlaps(gr, subj) a b c 0 1 0 > countOverlaps(grl, subj) a b c 0 0 0 Valerie On 02/29/12 10:24, swaraj basu wrote: > Dear All, > > I am getting completely different results for the count of mapped reads on > transcript features using the countOverlaps and summarizeOverlaps. > > CountOverlaps on using a gappedAlignment and Granges object gives slightly > different results. SummarizeOverlaps gives totally different results for > the modes "Union" and "intersectionNotEmpty". > > Can someone please let me know where I am making a mistake. I have attached > a screenshot of the transcripts. All the transcripts belong to the same > gene. > > Following steps were performed by me on a small subset of my data > > 1. I built a gapped alignment object of mapped reads using the command > reads=readBamGappedAlignments("path / to / bam/sorted.bam") > > 2. From the gapped alignment I built a genomic ranges object > ranges=GRanges(seqnames=rname(reads), > ranges=IRanges(start=start(reads), > end=end(reads)),strand=rep("*",length(reads))) > > 3. I built a genomic ranges list from a custom genedb object (4 > transcripts). > geneE=exonsBy(genedb,'gene') > > 4. Used the count and summarize overlaps functions > a). count1=countOverlaps(geneE, reads) > b). count2=countOverlaps(geneE, ranges) > c). count3=assays(summarizeOverlaps(geneE, reads, > mode="UNION"))$counts[,1] > c). count4=assays(summarizeOverlaps(geneE, reads, > mode="intersectionNotEmpty"))$counts[,1] > > 5. I get four different count results > "T1, T2, T3, T4" > a). count1 "55972, 41029, 18270, 18734" > b). count2 "55989, 41046, 18287, 18751" > c). count3 "0, 2, 0, 1092" > d). count4 "0, 3712, 0, 2550" > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor