Entering edit mode
swaraj basu
▴
50
@swaraj-basu-4629
Last seen 10.3 years ago
Dear Valerie,
Thanks a lot for your explanation. This makes it a lot clear to me. My
problem is if I have to get RPKM value from read count then what
should I
explicitly trust, summarizeOverlaps or countOverlaps. What I have
understood is, to work on gene RPKM I should rely upon
summarizeOverlaps
while to work on RPKM values of transcripts I should consider
countOverlaps. Do let me know if I am correct in my assumption.
regards
Swaraj
On Wed, Feb 29, 2012 at 9:07 PM, Valerie Obenchain
<vobencha@fhcrc.org>wrote:
> Hello,
>
> You should expect different results from countOverlaps and
> summarizeOverlaps because they count in different ways.
>
> The different modes of summarizeOverlaps counting are explained in
the
> vignette, "Overview of summarizeOverlaps", found by
>
> browseVignettes("**GenomicRanges") or at
>
> http://bioconductor.org/**packages/2.10/bioc/html/**GenomicRanges.ht
ml<http: bioconductor.org="" packages="" 2.10="" bioc="" html="" genomicranges.html="">
>
> summarizeOverlaps was developed to provide count methods that do not
> "double count" a read that hit more than one feature (where a
feature is an
> exon, transcript etc.). summarizeOverlaps also treats a GRanges and
> GRangesList differently. This is explained in the man page and in
the
> vignette. Once you've had a chance to read the documentation let me
know if
> this is not clear and I'd be happy to explain with examples.
>
> Note that the GappedAlignments container is gap-aware. When a
> GappedAlignment is coerced to a GRangesList the portions of the
gapped read
> become distinct rows in the new GRanges obejct. When a
GappedAlignment is
> coerced to a GRanges the gaps are collapsed and the range is
continuous. In
> other words we are no longer aware of our gaps.
>
> ## Here we create the three objects with the same reads,
>
> galn <- GappedAlignments(
> names = c("a","b","c"),
> rname = Rle(rep("chr1", 3)),
> pos = as.integer(c(1400, 3100, 5200)),
> cigar = c("500M", "50M200N50M", "50M150N50M"),
> strand = strand(rep("+", 3)))
> gr <- as(galn, "GRanges")
> grl <- as(galn, "GRangesList")
>
> ## Create a subject for overlap testing that falls in the gap of the
> second range
>
> > subj <- GRanges("chr1", IRanges(3200, 3300))
> > subj
> GRanges with 1 range and 0 elementMetadata cols:
> seqnames ranges strand
> <rle> <iranges> <rle>
> [1] chr1 [3200, 3300] *
> ---
> seqlengths:
> chr1
> NA
>
> ## Investigate how this range is counted
>
> > countOverlaps(galn, subj)
> a b c
> 0 0 0
> > countOverlaps(gr, subj)
> a b c
> 0 1 0
> > countOverlaps(grl, subj)
> a b c
> 0 0 0
>
>
> Valerie
>
>
> On 02/29/12 10:24, swaraj basu wrote:
>
>> Dear All,
>>
>> I am getting completely different results for the count of mapped
reads on
>> transcript features using the countOverlaps and summarizeOverlaps.
>>
>> CountOverlaps on using a gappedAlignment and Granges object gives
slightly
>> different results. SummarizeOverlaps gives totally different
results for
>> the modes "Union" and "intersectionNotEmpty".
>>
>> Can someone please let me know where I am making a mistake. I have
>> attached
>> a screenshot of the transcripts. All the transcripts belong to the
same
>> gene.
>>
>> Following steps were performed by me on a small subset of my data
>>
>> 1. I built a gapped alignment object of mapped reads using the
command
>> reads=readBamGappedAlignments(**"path / to / bam/sorted.bam")
>>
>> 2. From the gapped alignment I built a genomic ranges object
>> ranges=GRanges(seqnames=rname(**reads),
>> ranges=IRanges(start=start(**reads),
>> end=end(reads)),strand=rep("*"**,length(reads)))
>>
>> 3. I built a genomic ranges list from a custom genedb object (4
>> transcripts).
>> geneE=exonsBy(genedb,'gene')
>>
>> 4. Used the count and summarize overlaps functions
>> a). count1=countOverlaps(geneE, reads)
>> b). count2=countOverlaps(geneE, ranges)
>> c). count3=assays(**summarizeOverlaps(geneE, reads,
>> mode="UNION"))$counts[,1]
>> c). count4=assays(**summarizeOverlaps(geneE, reads,
>> mode="intersectionNotEmpty"))$**counts[,1]
>>
>> 5. I get four different count results
>> "T1, T2, T3, T4"
>> a). count1 "55972, 41029, 18270, 18734"
>> b). count2 "55989, 41046, 18287, 18751"
>> c). count3 "0, 2, 0, 1092"
>> d). count4 "0, 3712, 0, 2550"
>>
>>
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor="">
>>
>
>
--
Swaraj Basu
PhD Student (Bioinformatics - Functional Genomics)
Animal Physiology and Evolution
Stazione Zoologica Anton Dohrn
Naples
[[alternative HTML version deleted]]