Search
Question: Transitive overlaps of GRanges object
2
gravatar for Stefanie Tauber
2.8 years ago by
Germany
Stefanie Tauber40 wrote:

Hi,

I have a GRanges object und would like to retrieve sets of reciprocal overlaps:

library(GenomicRanges)

gr = GRanges(seqnames = rep("chr1",4),
             ranges   = IRanges(start = c(1,2,3,8),
                                 end  = c(12,6,6,10)),
             strand = rep("+", 4))
> gr
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1   [1, 12]      +
  [2]     chr1   [2,  6]      +
  [3]     chr1   [3,  6]      +
  [4]     chr1   [8, 10]      +
  -------

 

Range #1 overlaps with all other ranges, however I would like to get only those overlaps which hold in a transitive manner. such as: Range #1 overlaps with #2, #2 with #3 and #3 with #1.
At the moment I am filtering the output of findOverlaps (unsatisfactorily with the help of a loop..). Is there a more direct way to do this?

 

Many thanks for any comments,

Stefanie

   
ADD COMMENTlink modified 2.8 years ago by Hervé Pagès ♦♦ 13k • written 2.8 years ago by Stefanie Tauber40
5
gravatar for Sean Davis
2.8 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:

The problem could be thought of in graph terms.  The nodes are the regions.  The edges consist of connecting all regions that overlap.  These edges are exactly what is returned by findOverlaps().  If you take these concepts and actually construct a graph, there are a number of packages that will construct the transitive closure.  This eliminates the need to do findOverlaps() in the loop.  

Edit: Vince points out that the RBGL package has a transitive closure functionality; this should be pretty performant, I suspect.

Edit: Here is some code with a largish set of regions (10,000 exons) showing performance (about 5 seconds for the entire script, including transitive closure calculation).

# Form a large set of regions
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
exons = exons(TxDb.Hsapiens.UCSC.hg19.knownGene)[1:10000]
library(graph)
# Make a graph with edges representing overlapping regions
g = graphNEL(nodes=as.character(seq_along(exons)))
x = as.data.frame(findOverlaps(exons))
g = addEdge(from=as.character(x[,1]),to=as.character(x[,2]),g)
# Compute the transitive closure
library(RGBL)
system.time((g1 = transitive.closure(g)))
  user  system elapsed 
  2.666   0.426   3.110 
g1
# And use edges(g1) to capture the transitive relationships.
head(edges(g1))
$`1`
[1] "1"

$`2`
[1] "2" "3" "4"

$`3`
[1] "2" "3" "4"

$`4`
[1] "2" "3" "4"

$`5`
[1] "5" "6"

$`6`
[1] "5" "6"

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Sean Davis21k
2

I'll note that RBGL package has a transitive.closure function that operates on various types

of graph defined in the graph package.  Transforming the range data into a graph for this purpose

doesn't seem particularly hard, but in a naive approach there would be pairwise testing.  

ADD REPLYlink written 2.8 years ago by Vincent J. Carey, Jr.6.2k

Thanks for pointing out the graph perspective. I will consider to reformulate the problem in this context!

ADD REPLYlink written 2.8 years ago by Stefanie Tauber40

What a neat, elegant way :) thanks a lot!

ADD REPLYlink written 2.8 years ago by Stefanie Tauber40

I wish all my posts were associated with "neat" and "elegant"--thanks!

ADD REPLYlink written 2.8 years ago by Sean Davis21k
1
gravatar for Stefanie Tauber
2.8 years ago by
Germany
Stefanie Tauber40 wrote:

Hi Herve,

I think one way to achieve what I would like to do is as follow:

ov = as.matrix(findOverlaps(gr, gr))
ranges_by_overlap = split(gr[ov[, 2]], ov[, 1])

lapply(ranges_by_overlap, function(x) Reduce(intersect,x))


By doing so, all 'correct' transitive overlaps will have a valid intersection as output from Reduce(intersect,x). All non-transitive overlaps will return an empty range.

However, this is unfortunately pretty slow...
Best,

Stefanie

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Stefanie Tauber40
0
gravatar for Hervé Pagès
2.8 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Stephanie,

Not sure what you're trying to achieve exactly but note that the easiest way to "cluster ranges by overlap" is by calling reduce() with with.revmap=TRUE:

library(GenomicRanges)
gr <- GRanges(seqnames="chr1",
              ranges=IRanges(c( 1, 2, 3,  8, 15, 20,  9),
                             c(12, 6, 6, 10, 20, 25, 13)),
              strand="+")
rgr <- reduce(gr, min.gapwidth=0, with.revmap=TRUE)

Note that min.gapwidth is 1 by default and that means that adjacent ranges that do not overlap would be considered to be in the same cluster. So we use min.gapwidth=0 to avoid that.

Then:

> rgr
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |        revmap
         <Rle> <IRanges>  <Rle> | <IntegerList>
  [1]     chr1  [ 1, 13]      + |     1,2,3,...
  [2]     chr1  [15, 25]      + |           5,6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

rgr contains 1 range per cluster and its revmap metadata column contains the indices of the original ranges that belong to each cluster:

> mcols(rgr)$revmap
IntegerList of length 2
[[1]] 1 2 3 4 7
[[2]] 5 6

Hope this helps,

H.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Hervé Pagès ♦♦ 13k

Hi Herve,

thanks for pointing out the 'with.revmap' argument. I was not aware of that argument.

In the end I would like to have the intersection of ranges for which the transitive overlap holds. I would compute the intersection of those ranges via Reduce(intersect, granges). However, I have troubles to get the ranges clustered by transitive overlaps.
The problem with mcols(rgr)$revmap is that I retrieve again non-transitive overlap clusters.

Best,

Stefanie

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Stefanie Tauber40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 223 users visited in the last hour