summarizeOverlaps, inter.feature=FALSE and mode="IntersectionNotEmpty"
1
0
Entering edit mode
@mikelove
Last seen 42 minutes ago
United States
hi, The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look very useful. I'm wondering about the behavior for inter.feature=FALSE and mode="IntersectionNotEmpty". Earlier on the list, from Martin Morgan, there was the proposed behavior: You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes 'counted twice' |*----------------------+-----+-----------+-----------+--------------- |*|* Mode | Row | Feature 1 | Feature 2 | Hits per read |*|*----------------------+-----+-----------+-----------+------------- --|*|* Union | 5 | 1 | 0 | 1 |*|* | 6 | 1 | 1 | 2 |*|* | 7 | 1 | 1 | 2 |*|* IntersectionStrict | 5 | 1 | 0 | 1 |*|* | 6 | 1 | 0 | 1 |*|* | 7 | 1 | 1 | 2 |*|* IntersectionNotEmpty | 5 | 1 | 0 | 1 |*|* | 6 | 1 | 1 | 2 |*|* | 7 | 1 | 1 | 2 |*|*----------------------+-----+-----------+-----------+---- -----------|* ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html ) For row 6 of this diagram ( http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges/ins t/doc/summarizeOverlaps-modes.pdf) and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit. The man page has: IntersectionNotEmpty : For this counting mode, the features are partitioned into unique disjoint regions. This is accomplished by disjoining the feature ranges then removing ranges shared by more than one feature. The result is a group of non-overlapping regions each of which belong to a single feature. Simple and gapped reads are counted if, 1. the read or exactly 1 of the read fragments overlaps a unique disjoint region 2. the read or >1 read fragments overlap >1 unique disjoint region from the same feature An example, where the read falls completely within Feature 1 but partially in Feature 2, the behavior is consistent with the table above: > reads <- GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+")) > features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L))) > assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty", inter.feature=FALSE)) [,1] [1,] 1 [2,] 1 > disjoin(features) GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1300, 1699] * [2] chr1 [1700, 2000] * [3] chr1 [2001, 2400] * --- seqlengths: chr1 NA > findOverlaps(reads, disjoin(features)) Hits of length 2 queryLength: 1 subjectLength: 3 queryHits subjectHits <integer> <integer> 1 1 1 2 1 2 So the example read overlaps the unique region of Feature 1 and the shared region, which supposedly has been removed. So I would expect it to only count to Feature 1. > sessionInfo() R Under development (unstable) (2013-05-14 r62742) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7 [4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1 loaded via a namespace (and not attached): [1] stats4_3.1.0 thanks, Mike [[alternative HTML version deleted]]
• 1.0k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
I agree Mike. Following this same logic, row 7 for IntersectionNotEmpty should be 0,0 instead of 1,1. Thomas, Martin, would you agree? Valerie On 05/15/2013 04:47 AM, Michael Love wrote: > hi, > > The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look > very useful. > > I'm wondering about the behavior for inter.feature=FALSE and > mode="IntersectionNotEmpty". > > Earlier on the list, from Martin Morgan, there was the proposed behavior: > > You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes > 'counted twice' > > |*----------------------+-----+-----------+-----------+------------- --|*|* > Mode | Row | Feature 1 | Feature 2 | Hits per read > |*|*----------------------+-----+-----------+-----------+----------- ----|*|* > Union | 5 | 1 | 0 | 1 > |*|* | 6 | 1 | 1 | > 2 |*|* | 7 | 1 | 1 | > 2 |*|* IntersectionStrict | 5 | 1 | 0 | > 1 |*|* | 6 | 1 | 0 | > 1 |*|* | 7 | 1 | 1 | > 2 |*|* IntersectionNotEmpty | 5 | 1 | 0 | > 1 |*|* | 6 | 1 | 1 | > 2 |*|* | 7 | 1 | 1 | > 2 |*|*----------------------+-----+-----------+-----------+- --------------|* > > ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html ) > > For row 6 of this diagram ( > http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges/i nst/doc/summarizeOverlaps-modes.pdf) > and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit. > > The man page has: > > IntersectionNotEmpty : For this counting mode, the features are partitioned > into unique disjoint regions. This is accomplished by disjoining the > feature ranges then removing ranges shared by more than one feature. The > result is a group of non-overlapping regions each of which belong to a > single feature. > Simple and gapped reads are counted if, > 1. the read or exactly 1 of the read fragments overlaps a unique disjoint > region > 2. the read or >1 read fragments overlap >1 unique disjoint region from > the same feature > > An example, where the read falls completely within Feature 1 but partially > in Feature 2, the behavior is consistent with the table above: > >> reads <- > GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+")) >> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L))) >> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty", > inter.feature=FALSE)) > [,1] > [1,] 1 > [2,] 1 > >> disjoin(features) > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1300, 1699] * > [2] chr1 [1700, 2000] * > [3] chr1 [2001, 2400] * > --- > seqlengths: > chr1 > NA > >> findOverlaps(reads, disjoin(features)) > Hits of length 2 > queryLength: 1 > subjectLength: 3 > queryHits subjectHits > <integer> <integer> > 1 1 1 > 2 1 2 > > So the example read overlaps the unique region of Feature 1 and the shared > region, which supposedly has been removed. So I would expect it to only > count to Feature 1. > >> sessionInfo() > R Under development (unstable) (2013-05-14 r62742) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7 > [4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1 > > loaded via a namespace (and not attached): > [1] stats4_3.1.0 > > > thanks, > > Mike > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 1073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6