Question: Find hits of subject that start somewhere in the query by using GRanges / IRanges and findOverlap()
0
gravatar for svenbioinf
3.3 years ago by
svenbioinf0
svenbioinf0 wrote:

Using findOverlaps() on a GRanges object I would like to retrive the following hits:

 

Sub:  -------|||||||||||||------------------

Query Ranges:

Hit5: ---------||||||||||||||||||------------

Hit6: -----------|||||||||||||||||||---------

Hit4: ---------|||||||---------------------

.

.

.

That means hits that start in a subject range and may or may not extend over it.

Minimal example:

> sub <- GRanges(c(1),strand=Rle(c("+"),c(1)), IRanges(c(5), c(7)),mcols=data.frame(id=c("T1")))

> sub
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand | mcols.id
         <Rle> <IRanges>  <Rle> | <factor>
  [1]        1    [5, 7]      + |       T1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> query <- GRanges(c(1,1,1,1,1,1),strand=Rle(c("+","+","+","+","+","+"),c(1,1,1,1,1,1)), IRanges(c(4,4,6,6,7,7), c(5,5,6,6,8,8)),mcols=data.frame(id=c("T8","T9","T10","T11","T12","T13")))
> query
GRanges object with 6 ranges and 1 metadata column:
      seqnames    ranges strand | mcols.id
         <Rle> <IRanges>  <Rle> | <factor>
  [1]        1    [4, 5]      + |       T8
  [2]        1    [4, 5]      + |       T9
  [3]        1    [6, 6]      + |      T10
  [4]        1    [6, 6]      + |      T11
  [5]        1    [7, 8]      + |      T12
  [6]        1    [7, 8]      + |      T13
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

I tried with countOverlaps type= "start" but that only gives me hits that start at the exact same position.

> sum(countOverlaps(query,sub))
[1] 6
> sum(countOverlaps(query,sub,type="start"))
[1] 0

 

There must be a way, thanks for looking into that!

iranges granges R findoverlaps • 526 views
ADD COMMENTlink modified 3.3 years ago by Michael Lawrence11k • written 3.3 years ago by svenbioinf0
Answer: Find hits of subject that start somewhere in the query by using GRanges / IRange
3
gravatar for Michael Lawrence
3.3 years ago by
United States
Michael Lawrence11k wrote:
findOverlaps(start(query), subject)

 

ADD COMMENTlink written 3.3 years ago by Michael Lawrence11k

Hi Michael! Oh, I understand what you are doing here!
However:

findOverlaps(start(query), subject)

Here, subject has to be a IRanges object that doesnt account for strand information.  So by ranges(sub) I get the IRanges and now I have to take care of the strand information myself.

 

This is a nice solution, thank you very much Michael!

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by svenbioinf0

Sorry, here is a better way for GRanges:

findOverlaps(resize(query, 1L), subject)
ADD REPLYlink written 3.3 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 470 users visited in the last hour