Question

Transitive overlaps of GRanges object

2

Entering edit mode

Stefanie Tauber ▴ 40

@stefanie-tauber-3978

Last seen 2.1 years ago

Germany

Hi,

I have a GRanges object und would like to retrieve sets of reciprocal overlaps:

library(GenomicRanges)

gr = GRanges(seqnames = rep("chr1",4),
             ranges   = IRanges(start = c(1,2,3,8),
                                 end  = c(12,6,6,10)),
             strand = rep("+", 4))
> gr
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1   [1, 12]      +
  [2]     chr1   [2,  6]      +
  [3]     chr1   [3,  6]      +
  [4]     chr1   [8, 10]      +
  -------

Range #1 overlaps with all other ranges, however I would like to get only those overlaps which hold in a transitive manner. such as: Range #1 overlaps with #2, #2 with #3 and #3 with #1.
At the moment I am filtering the output of findOverlaps (unsatisfactorily with the help of a loop..). Is there a more direct way to do this?

Many thanks for any comments,

Stefanie

granges findOverlaps • 2.6k views

ADD COMMENT • link updated 9.2 years ago by Hervé Pagès 16k • written 9.2 years ago by Stefanie Tauber ▴ 40

1

Entering edit mode

Stefanie Tauber ▴ 40

@stefanie-tauber-3978

Last seen 2.1 years ago

Germany

Hi Herve,

I think one way to achieve what I would like to do is as follow:

ov = as.matrix(findOverlaps(gr, gr))
ranges_by_overlap = split(gr[ov[, 2]], ov[, 1])

lapply(ranges_by_overlap, function(x) Reduce(intersect,x))

By doing so, all 'correct' transitive overlaps will have a valid intersection as output from Reduce(intersect,x). All non-transitive overlaps will return an empty range.

However, this is unfortunately pretty slow...
Best,

Stefanie

ADD COMMENT • link 9.2 years ago Stefanie Tauber ▴ 40

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 16 hours ago

Seattle, WA, United States

Hi Stephanie,

Not sure what you're trying to achieve exactly but note that the easiest way to "cluster ranges by overlap" is by calling reduce() with with.revmap=TRUE:

library(GenomicRanges)
gr <- GRanges(seqnames="chr1",
              ranges=IRanges(c( 1, 2, 3,  8, 15, 20,  9),
                             c(12, 6, 6, 10, 20, 25, 13)),
              strand="+")
rgr <- reduce(gr, min.gapwidth=0, with.revmap=TRUE)

Note that min.gapwidth is 1 by default and that means that adjacent ranges that do not overlap would be considered to be in the same cluster. So we use min.gapwidth=0 to avoid that.

Then:

> rgr
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |        revmap
         <Rle> <IRanges>  <Rle> | <IntegerList>
  [1]     chr1  [ 1, 13]      + |     1,2,3,...
  [2]     chr1  [15, 25]      + |           5,6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

rgr contains 1 range per cluster and its revmap metadata column contains the indices of the original ranges that belong to each cluster:

> mcols(rgr)$revmap
IntegerList of length 2
[[1]] 1 2 3 4 7
[[2]] 5 6

Hope this helps,

H.

ADD COMMENT • link 9.2 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Herve,

thanks for pointing out the 'with.revmap' argument. I was not aware of that argument.

In the end I would like to have the intersection of ranges for which the transitive overlap holds. I would compute the intersection of those ranges via Reduce(intersect, granges). However, I have troubles to get the ranges clustered by transitive overlaps.
The problem with mcols(rgr)$revmap is that I retrieve again non-transitive overlap clusters.

Best,

Stefanie

ADD REPLY • link 9.2 years ago Stefanie Tauber ▴ 40

score 5 · Accepted Answer · 2015-02-17

5

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

The problem could be thought of in graph terms. The nodes are the regions. The edges consist of connecting all regions that overlap. These edges are exactly what is returned by findOverlaps(). If you take these concepts and actually construct a graph, there are a number of packages that will construct the transitive closure. This eliminates the need to do findOverlaps() in the loop.

Edit: Vince points out that the RBGL package has a transitive closure functionality; this should be pretty performant, I suspect.

Edit: Here is some code with a largish set of regions (10,000 exons) showing performance (about 5 seconds for the entire script, including transitive closure calculation).

# Form a large set of regions
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
exons = exons(TxDb.Hsapiens.UCSC.hg19.knownGene)[1:10000]
library(graph)
# Make a graph with edges representing overlapping regions
g = graphNEL(nodes=as.character(seq_along(exons)))
x = as.data.frame(findOverlaps(exons))
g = addEdge(from=as.character(x[,1]),to=as.character(x[,2]),g)
# Compute the transitive closure
library(RGBL)
system.time((g1 = transitive.closure(g)))
  user  system elapsed 
  2.666   0.426   3.110 
g1
# And use edges(g1) to capture the transitive relationships.
head(edges(g1))
$`1`
[1] "1"

$`2`
[1] "2" "3" "4"

$`3`
[1] "2" "3" "4"

$`4`
[1] "2" "3" "4"

$`5`
[1] "5" "6"

$`6`
[1] "5" "6"

ADD COMMENT • link 9.2 years ago Sean Davis 21k

2

Entering edit mode

I'll note that RBGL package has a transitive.closure function that operates on various types

of graph defined in the graph package. Transforming the range data into a graph for this purpose

doesn't seem particularly hard, but in a naive approach there would be pairwise testing.

ADD REPLY • link 9.2 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Thanks for pointing out the graph perspective. I will consider to reformulate the problem in this context!

ADD REPLY • link 9.2 years ago Stefanie Tauber ▴ 40

0

Entering edit mode

What a neat, elegant way :) thanks a lot!

ADD REPLY • link 9.2 years ago Stefanie Tauber ▴ 40

0

Entering edit mode

I wish all my posts were associated with "neat" and "elegant"--thanks!

ADD REPLY • link 9.2 years ago Sean Davis 21k