#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: counting reads with muliple hits with SummarizeOverlaps
0
10 months ago by
Lolla0
Lolla0 wrote:

Hi

I worked out the example listed in top of page six, I noticed read 3 and 6 overlap two features (C and G), Feature G  "overlaps" with C when counting using summarizeoverlaps Union and IntersectionNotEmpty modes counts +1 for both of them!! my understanding when using Union mode (as stated in the man page) the reads that overlap more than one feature not counted, IntersectionNotEmpty looks for unique disjoint region then decides where to count the read. Here we have 2 disjoint regions..

R ngs summarizeoverlaps seq • 203 views
modified 10 months ago by Valerie Obenchain6.7k • written 10 months ago by Lolla0
0
10 months ago by
United States
Valerie Obenchain6.7k wrote:

Hi,

I think your question is why is there a count for Union when a read overlaps with multiple ranges in a single element of a GRangesList. That's the question I'll answer - if that's wrong, please clarify.

An element of a GRangesList is considered a single feature. If a read overlaps with any of these it will be counted for the Union mode. In the example you refer to we have features "C" and "G":



> lst[c("C", "G")]
GRangesList object of length 2:
$C GRanges object with 2 ranges and 1 metadata column: seqnames ranges strand | group_id <Rle> <IRanges> <Rle> | <character> [1] chr1 3000-3499 + | C [2] chr1 3600-3899 + | C$G
GRanges object with 2 ranges and 1 metadata column:
seqnames    ranges strand | group_id
[1]     chr2 3000-3149      + |        G
[2]     chr2 3350-3549      + |        G

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths



which overlap with reads "c" and "f":


GAlignments object with 2 alignments and 0 metadata columns:
seqnames strand       cigar    qwidth     start       end     width
<Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
c     chr1      +        300M       300      3400      3699       300
f     chr2      +  50M200N50M       100      3100      3399       300
njunc
<integer>
c         0
f         1
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths



Reads "c" and "f" overlap both ranges in features "C" and "G". They could overlap with one or both ranges in the GRangesList element and still be counted for the mode Union.  This is documented on the man page:



features: A GRanges or a GRangesList object of genomic regions of
interest. When a GRanges is supplied, each row is considered
a feature. When a GRangesList is supplied, each higher
list-level is considered a feature. This distinction is
important when defining overlaps.



There is an example that further explains counting with features in a GRangesList at the bottom of the man page under 'Counting modes', specifically the section starting with this comment:



## The GRangesList ('grl' object) has 8 features whereas the GRanges
## ('gr' object) has 11. The affect on counting can be seen by looking
## at feature 'H' with mode 'Union'. In the GRanges this feature is
## represented by ranges 'H1' and 'H2',