Search
Question: counting reads with muliple hits with SummarizeOverlaps
0
gravatar for Lolla
5 months ago by
Lolla0
Lolla0 wrote:

Hi

in this web page http://bioconductor.org/packages/3.7/bioc/vignettes/GenomicAlignments/inst/doc/summarizeOverlaps.pdf

I worked out the example listed in top of page six, I noticed read 3 and 6 overlap two features (C and G), Feature G  "overlaps" with C when counting using summarizeoverlaps Union and IntersectionNotEmpty modes counts +1 for both of them!! my understanding when using Union mode (as stated in the man page) the reads that overlap more than one feature not counted, IntersectionNotEmpty looks for unique disjoint region then decides where to count the read. Here we have 2 disjoint regions..

 

Thanks in advance..

ADD COMMENTlink modified 5 months ago by Valerie Obenchain ♦♦ 6.6k • written 5 months ago by Lolla0
0
gravatar for Valerie Obenchain
5 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Hi,

I think your question is why is there a count for Union when a read overlaps with multiple ranges in a single element of a GRangesList. That's the question I'll answer - if that's wrong, please clarify.

An element of a GRangesList is considered a single feature. If a read overlaps with any of these it will be counted for the Union mode. In the example you refer to we have features "C" and "G":

```

> lst[c("C", "G")]
GRangesList object of length 2:
$C
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |    group_id
         <Rle> <IRanges>  <Rle> | <character>
  [1]     chr1 3000-3499      + |           C
  [2]     chr1 3600-3899      + |           C

$G
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand | group_id
  [1]     chr2 3000-3149      + |        G
  [2]     chr2 3350-3549      + |        G

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths

```

which overlap with reads "c" and "f":

```
> reads[c(3,6)]
GAlignments object with 2 alignments and 0 metadata columns:
    seqnames strand       cigar    qwidth     start       end     width
       <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
  c     chr1      +        300M       300      3400      3699       300
  f     chr2      +  50M200N50M       100      3100      3399       300
        njunc
    <integer>
  c         0
  f         1
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

```

Reads "c" and "f" overlap both ranges in features "C" and "G". They could overlap with one or both ranges in the GRangesList element and still be counted for the mode Union.  This is documented on the man page:

```

features: A GRanges or a GRangesList object of genomic regions of
          interest. When a GRanges is supplied, each row is considered
          a feature. When a GRangesList is supplied, each higher
          list-level is considered a feature. This distinction is
          important when defining overlaps.

```

There is an example that further explains counting with features in a GRangesList at the bottom of the man page under 'Counting modes', specifically the section starting with this comment:

```

     ## The GRangesList ('grl' object) has 8 features whereas the GRanges
     ## ('gr' object) has 11. The affect on counting can be seen by looking
     ## at feature 'H' with mode 'Union'. In the GRanges this feature is
     ## represented by ranges 'H1' and 'H2',

```

If going through that example does not answer your question please show a code example of what you are unsure about.

Valerie

ADD COMMENTlink written 5 months ago by Valerie Obenchain ♦♦ 6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 317 users visited in the last hour