summarizeOverlaps vs countOverlaps

0

Entering edit mode

swaraj basu ▴ 50

@swaraj-basu-4629

Last seen 9.6 years ago

Dear All, I am getting completely different results for the count of mapped reads on transcript features using the countOverlaps and summarizeOverlaps. CountOverlaps on using a gappedAlignment and Granges object gives slightly different results. SummarizeOverlaps gives totally different results for the modes "Union" and "intersectionNotEmpty". Can someone please let me know where I am making a mistake. I have attached a screenshot of the transcripts. All the transcripts belong to the same gene. Following steps were performed by me on a small subset of my data 1. I built a gapped alignment object of mapped reads using the command reads=readBamGappedAlignments("path / to / bam/sorted.bam") 2. From the gapped alignment I built a genomic ranges object ranges=GRanges(seqnames=rname(reads), ranges=IRanges(start=start(reads), end=end(reads)),strand=rep("*",length(reads))) 3. I built a genomic ranges list from a custom genedb object (4 transcripts). geneE=exonsBy(genedb,'gene') 4. Used the count and summarize overlaps functions a). count1=countOverlaps(geneE, reads) b). count2=countOverlaps(geneE, ranges) c). count3=assays(summarizeOverlaps(geneE, reads, mode="UNION"))$counts[,1] c). count4=assays(summarizeOverlaps(geneE, reads, mode="intersectionNotEmpty"))$counts[,1] 5. I get four different count results "T1, T2, T3, T4" a). count1 "55972, 41029, 18270, 18734" b). count2 "55989, 41046, 18287, 18751" c). count3 "0, 2, 0, 1092" d). count4 "0, 3712, 0, 2550" -- Swaraj Basu PhD Student (Bioinformatics - Functional Genomics) Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Naples -------------- next part -------------- A non-text attachment was scrubbed... Name: transcripts.png Type: image/png Size: 12062 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120229="" 3e4c18b1="" attachment.png="">

Alignment Alignment • 2.9k views

ADD COMMENT • link updated 12.1 years ago by Valerie Obenchain ★ 6.8k • written 12.1 years ago by swaraj basu ▴ 50

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hello, You should expect different results from countOverlaps and summarizeOverlaps because they count in different ways. The different modes of summarizeOverlaps counting are explained in the vignette, "Overview of summarizeOverlaps", found by browseVignettes("GenomicRanges") or at http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html summarizeOverlaps was developed to provide count methods that do not "double count" a read that hit more than one feature (where a feature is an exon, transcript etc.). summarizeOverlaps also treats a GRanges and GRangesList differently. This is explained in the man page and in the vignette. Once you've had a chance to read the documentation let me know if this is not clear and I'd be happy to explain with examples. Note that the GappedAlignments container is gap-aware. When a GappedAlignment is coerced to a GRangesList the portions of the gapped read become distinct rows in the new GRanges obejct. When a GappedAlignment is coerced to a GRanges the gaps are collapsed and the range is continuous. In other words we are no longer aware of our gaps. ## Here we create the three objects with the same reads, galn <- GappedAlignments( names = c("a","b","c"), rname = Rle(rep("chr1", 3)), pos = as.integer(c(1400, 3100, 5200)), cigar = c("500M", "50M200N50M", "50M150N50M"), strand = strand(rep("+", 3))) gr <- as(galn, "GRanges") grl <- as(galn, "GRangesList") ## Create a subject for overlap testing that falls in the gap of the second range > subj <- GRanges("chr1", IRanges(3200, 3300)) > subj GRanges with 1 range and 0 elementMetadata cols: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [3200, 3300] * --- seqlengths: chr1 NA ## Investigate how this range is counted > countOverlaps(galn, subj) a b c 0 0 0 > countOverlaps(gr, subj) a b c 0 1 0 > countOverlaps(grl, subj) a b c 0 0 0 Valerie On 02/29/12 10:24, swaraj basu wrote: > Dear All, > > I am getting completely different results for the count of mapped reads on > transcript features using the countOverlaps and summarizeOverlaps. > > CountOverlaps on using a gappedAlignment and Granges object gives slightly > different results. SummarizeOverlaps gives totally different results for > the modes "Union" and "intersectionNotEmpty". > > Can someone please let me know where I am making a mistake. I have attached > a screenshot of the transcripts. All the transcripts belong to the same > gene. > > Following steps were performed by me on a small subset of my data > > 1. I built a gapped alignment object of mapped reads using the command > reads=readBamGappedAlignments("path / to / bam/sorted.bam") > > 2. From the gapped alignment I built a genomic ranges object > ranges=GRanges(seqnames=rname(reads), > ranges=IRanges(start=start(reads), > end=end(reads)),strand=rep("*",length(reads))) > > 3. I built a genomic ranges list from a custom genedb object (4 > transcripts). > geneE=exonsBy(genedb,'gene') > > 4. Used the count and summarize overlaps functions > a). count1=countOverlaps(geneE, reads) > b). count2=countOverlaps(geneE, ranges) > c). count3=assays(summarizeOverlaps(geneE, reads, > mode="UNION"))$counts[,1] > c). count4=assays(summarizeOverlaps(geneE, reads, > mode="intersectionNotEmpty"))$counts[,1] > > 5. I get four different count results > "T1, T2, T3, T4" > a). count1 "55972, 41029, 18270, 18734" > b). count2 "55989, 41046, 18287, 18751" > c). count3 "0, 2, 0, 1092" > d). count4 "0, 3712, 0, 2550" > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.1 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.