summarizeOverlaps, inter.feature=FALSE and mode="IntersectionNotEmpty"
1
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States
I agree, given the definition of "IntersectionNotEmpty", the count for that read in row 6 should be 1 for Feature1 and 0 for Feature2, whereas in row 7 it should be 0 for both. This also means, in this specific case there will be no difference in the read assignment (count results) when inter.feature is set to FALSE or TRUE. I guess, maining consistent feature defintions in the different counting modes will be certainly important. Thanks Michael for pointing this out. Thomas On Wed, May 15, 2013 at 03:47:46PM +0000, Valerie Obenchain wrote: > I agree Mike. Following this same logic, row 7 for IntersectionNotEmpty > should be 0,0 instead of 1,1. > > Thomas, Martin, would you agree? > > Valerie > > > On 05/15/2013 04:47 AM, Michael Love wrote: > > hi, > > > > The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look > > very useful. > > > > I'm wondering about the behavior for inter.feature=FALSE and > > mode="IntersectionNotEmpty". > > > > Earlier on the list, from Martin Morgan, there was the proposed behavior: > > > > You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes > > 'counted twice' > > > > |*----------------------+-----+-----------+-----------+----------- ----|*|* > > Mode | Row | Feature 1 | Feature 2 | Hits per read > > |*|*----------------------+-----+-----------+-----------+--------- ------|*|* > > Union | 5 | 1 | 0 | 1 > > |*|* | 6 | 1 | 1 | > > 2 |*|* | 7 | 1 | 1 | > > 2 |*|* IntersectionStrict | 5 | 1 | 0 | > > 1 |*|* | 6 | 1 | 0 | > > 1 |*|* | 7 | 1 | 1 | > > 2 |*|* IntersectionNotEmpty | 5 | 1 | 0 | > > 1 |*|* | 6 | 1 | 1 | > > 2 |*|* | 7 | 1 | 1 | > > 2 |*|*----------------------+-----+-----------+----------- +---------------|* > > > > ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html ) > > > > For row 6 of this diagram ( > > http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges /inst/doc/summarizeOverlaps-modes.pdf) > > and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit. > > > > The man page has: > > > > IntersectionNotEmpty : For this counting mode, the features are partitioned > > into unique disjoint regions. This is accomplished by disjoining the > > feature ranges then removing ranges shared by more than one feature. The > > result is a group of non-overlapping regions each of which belong to a > > single feature. > > Simple and gapped reads are counted if, > > 1. the read or exactly 1 of the read fragments overlaps a unique disjoint > > region > > 2. the read or >1 read fragments overlap >1 unique disjoint region from > > the same feature > > > > An example, where the read falls completely within Feature 1 but partially > > in Feature 2, the behavior is consistent with the table above: > > > >> reads <- > > GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+")) > >> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L))) > >> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty", > > inter.feature=FALSE)) > > [,1] > > [1,] 1 > > [2,] 1 > > > >> disjoin(features) > > GRanges with 3 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] chr1 [1300, 1699] * > > [2] chr1 [1700, 2000] * > > [3] chr1 [2001, 2400] * > > --- > > seqlengths: > > chr1 > > NA > > > >> findOverlaps(reads, disjoin(features)) > > Hits of length 2 > > queryLength: 1 > > subjectLength: 3 > > queryHits subjectHits > > <integer> <integer> > > 1 1 1 > > 2 1 2 > > > > So the example read overlaps the unique region of Feature 1 and the shared > > region, which supposedly has been removed. So I would expect it to only > > count to Feature 1. > > > >> sessionInfo() > > R Under development (unstable) (2013-05-14 r62742) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7 > > [4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1 > > > > loaded via a namespace (and not attached): > > [1] stats4_3.1.0 > > > > > > thanks, > > > > Mike > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
• 828 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
One example where 'IntersectionNotEmpty' results in different counts is when a read spans both features even after the shared regions have been removed. rd <- GAlignments("chr2", 7000L, "750M", strand("+")) ft <- GRanges("chr2", IRanges(c(7000, 7500), width=c(600, 300)), "+") When TRUE, the read can't be resolved. se1 <- summarizeOverlaps(ft, rd, mode, inter.feature=TRUE) > assays(se1)$counts [,1] [1,] 0 [2,] 0 When FALSE, each feature gets a hit. se2 <- summarizeOverlaps(ft, rd, mode, inter.feature=FALSE) > assays(se2)$counts [,1] [1,] 1 [2,] 1 Updates are implemented in GenomicRanges 1.13.12. Thanks. Valerie On 05/15/2013 10:04 AM, Thomas Girke wrote: > I agree, given the definition of "IntersectionNotEmpty", the count for > that read in row 6 should be 1 for Feature1 and 0 for Feature2, whereas > in row 7 it should be 0 for both. This also means, in this specific case > there will be no difference in the read assignment (count results) when > inter.feature is set to FALSE or TRUE. I guess, maining consistent > feature defintions in the different counting modes will be certainly > important. Thanks Michael for pointing this out. > > Thomas > > On Wed, May 15, 2013 at 03:47:46PM +0000, Valerie Obenchain wrote: >> I agree Mike. Following this same logic, row 7 for IntersectionNotEmpty >> should be 0,0 instead of 1,1. >> >> Thomas, Martin, would you agree? >> >> Valerie >> >> >> On 05/15/2013 04:47 AM, Michael Love wrote: >>> hi, >>> >>> The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look >>> very useful. >>> >>> I'm wondering about the behavior for inter.feature=FALSE and >>> mode="IntersectionNotEmpty". >>> >>> Earlier on the list, from Martin Morgan, there was the proposed behavior: >>> >>> You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes >>> 'counted twice' >>> >>> |*----------------------+-----+-----------+-----------+----------- ----|*|* >>> Mode | Row | Feature 1 | Feature 2 | Hits per read >>> |*|*----------------------+-----+-----------+-----------+--------- ------|*|* >>> Union | 5 | 1 | 0 | 1 >>> |*|* | 6 | 1 | 1 | >>> 2 |*|* | 7 | 1 | 1 | >>> 2 |*|* IntersectionStrict | 5 | 1 | 0 | >>> 1 |*|* | 6 | 1 | 0 | >>> 1 |*|* | 7 | 1 | 1 | >>> 2 |*|* IntersectionNotEmpty | 5 | 1 | 0 | >>> 1 |*|* | 6 | 1 | 1 | >>> 2 |*|* | 7 | 1 | 1 | >>> 2 |*|*----------------------+-----+-----------+---------- -+---------------|* >>> >>> ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html ) >>> >>> For row 6 of this diagram ( >>> http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges /inst/doc/summarizeOverlaps-modes.pdf) >>> and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit. >>> >>> The man page has: >>> >>> IntersectionNotEmpty : For this counting mode, the features are partitioned >>> into unique disjoint regions. This is accomplished by disjoining the >>> feature ranges then removing ranges shared by more than one feature. The >>> result is a group of non-overlapping regions each of which belong to a >>> single feature. >>> Simple and gapped reads are counted if, >>> 1. the read or exactly 1 of the read fragments overlaps a unique disjoint >>> region >>> 2. the read or >1 read fragments overlap >1 unique disjoint region from >>> the same feature >>> >>> An example, where the read falls completely within Feature 1 but partially >>> in Feature 2, the behavior is consistent with the table above: >>> >>>> reads <- >>> GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+")) >>>> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L))) >>>> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty", >>> inter.feature=FALSE)) >>> [,1] >>> [1,] 1 >>> [2,] 1 >>> >>>> disjoin(features) >>> GRanges with 3 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [1300, 1699] * >>> [2] chr1 [1700, 2000] * >>> [3] chr1 [2001, 2400] * >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>>> findOverlaps(reads, disjoin(features)) >>> Hits of length 2 >>> queryLength: 1 >>> subjectLength: 3 >>> queryHits subjectHits >>> <integer> <integer> >>> 1 1 1 >>> 2 1 2 >>> >>> So the example read overlaps the unique region of Feature 1 and the shared >>> region, which supposedly has been removed. So I would expect it to only >>> count to Feature 1. >>> >>>> sessionInfo() >>> R Under development (unstable) (2013-05-14 r62742) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7 >>> [4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1 >>> >>> loaded via a namespace (and not attached): >>> [1] stats4_3.1.0 >>> >>> >>> thanks, >>> >>> Mike >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 1074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6