Entering edit mode
I agree, given the definition of "IntersectionNotEmpty", the count for
that read in row 6 should be 1 for Feature1 and 0 for Feature2,
whereas
in row 7 it should be 0 for both. This also means, in this specific
case
there will be no difference in the read assignment (count results)
when
inter.feature is set to FALSE or TRUE. I guess, maining consistent
feature defintions in the different counting modes will be certainly
important. Thanks Michael for pointing this out.
Thomas
On Wed, May 15, 2013 at 03:47:46PM +0000, Valerie Obenchain wrote:
> I agree Mike. Following this same logic, row 7 for
IntersectionNotEmpty
> should be 0,0 instead of 1,1.
>
> Thomas, Martin, would you agree?
>
> Valerie
>
>
> On 05/15/2013 04:47 AM, Michael Love wrote:
> > hi,
> >
> > The new arguments 'inter.feature' and 'fragments' of
summarizeOverlaps look
> > very useful.
> >
> > I'm wondering about the behavior for inter.feature=FALSE and
> > mode="IntersectionNotEmpty".
> >
> > Earlier on the list, from Martin Morgan, there was the proposed
behavior:
> >
> > You'd like to add 3 more modes, with inter_feature=FALSE. Reads
are sometimes
> > 'counted twice'
> >
> > |*----------------------+-----+-----------+-----------+-----------
----|*|*
> > Mode | Row | Feature 1 | Feature 2 | Hits per read
> > |*|*----------------------+-----+-----------+-----------+---------
------|*|*
> > Union | 5 | 1 | 0 | 1
> > |*|* | 6 | 1 | 1 |
> > 2 |*|* | 7 | 1 | 1 |
> > 2 |*|* IntersectionStrict | 5 | 1 | 0 |
> > 1 |*|* | 6 | 1 | 0 |
> > 1 |*|* | 7 | 1 | 1 |
> > 2 |*|* IntersectionNotEmpty | 5 | 1 | 0 |
> > 1 |*|* | 6 | 1 | 1 |
> > 2 |*|* | 7 | 1 | 1 |
> > 2 |*|*----------------------+-----+-----------+-----------
+---------------|*
> >
> > (
https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html )
> >
> > For row 6 of this diagram (
> > http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges
/inst/doc/summarizeOverlaps-modes.pdf)
> > and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit.
> >
> > The man page has:
> >
> > IntersectionNotEmpty : For this counting mode, the features are
partitioned
> > into unique disjoint regions. This is accomplished by disjoining
the
> > feature ranges then removing ranges shared by more than one
feature. The
> > result is a group of non-overlapping regions each of which belong
to a
> > single feature.
> > Simple and gapped reads are counted if,
> > 1. the read or exactly 1 of the read fragments overlaps a unique
disjoint
> > region
> > 2. the read or >1 read fragments overlap >1 unique disjoint region
from
> > the same feature
> >
> > An example, where the read falls completely within Feature 1 but
partially
> > in Feature 2, the behavior is consistent with the table above:
> >
> >> reads <-
> >
GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+"))
> >> features <-
GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L)))
> >>
assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty",
> > inter.feature=FALSE))
> > [,1]
> > [1,] 1
> > [2,] 1
> >
> >> disjoin(features)
> > GRanges with 3 ranges and 0 metadata columns:
> > seqnames ranges strand
> > <rle> <iranges> <rle>
> > [1] chr1 [1300, 1699] *
> > [2] chr1 [1700, 2000] *
> > [3] chr1 [2001, 2400] *
> > ---
> > seqlengths:
> > chr1
> > NA
> >
> >> findOverlaps(reads, disjoin(features))
> > Hits of length 2
> > queryLength: 1
> > subjectLength: 3
> > queryHits subjectHits
> > <integer> <integer>
> > 1 1 1
> > 2 1 2
> >
> > So the example read overlaps the unique region of Feature 1 and
the shared
> > region, which supposedly has been removed. So I would expect it to
only
> > count to Feature 1.
> >
> >> sessionInfo()
> > R Under development (unstable) (2013-05-14 r62742)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] C
> >
> > attached base packages:
> > [1] parallel stats graphics grDevices utils datasets
methods
> > [8] base
> >
> > other attached packages:
> > [1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7
> > [4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1
> >
> > loaded via a namespace (and not attached):
> > [1] stats4_3.1.0
> >
> >
> > thanks,
> >
> > Mike
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor