Search
Question: overlapsAny for elements in GRangesList
0
2.4 years ago by
maltethodberg100
UCPH
maltethodberg100 wrote:

I have a GRanges and a GRangesList object, i.e.:

gr # GRanges

grList # GRangesList

I wish to know what ranges in the GRanges are in each of the ranges in each element in the list, so I thought of calling overlapsAny on each element of the GRangesList like so:

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))

This gives me the desired output, however this is very slow!

Is there a faster of doing the same thing?

modified 2.4 years ago by Michael Lawrence10k • written 2.4 years ago by maltethodberg100

Perhaps you could describe your high-level use case / problem?

I'm doing some enrichment analysis, so I need to annotate a "universe" of ranges with a wide range of other annotations.

4
2.4 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

Edit: This solution is incorrect.

This is a situation where you can avoid the lapply loop by using the unlist/relist idiom:

unlisted_grList <- unlist(grList, use.names=FALSE)
relist(overlapsAny(query=unlisted_grList, subject=gr), grList)

Will be 100x faster or more than using lapply().

Correct solution:

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE

See comments below for the details.

Cheers,

H.

Is that equivalent to the code in the question? He wants the query to be gr, and the subject the grList element.

Correct, this is not equivalent to the original example, here you overlap each element of the GRlist with GR, rather than the other way around.

2

Oops, you guys are right, my bad. Here is my 2nd attempt. The unlist/relist trick is actually not needed:

library(GenomicRanges)
gr <- GRanges(c("chr1:9-14", "chr3:11-16", "chr1:4-5"))
grList <- GRangesList(GRanges("chr1:5-10"),
GRanges(c("chr2:1-4", "chr1:4-5")),
GRanges(),
GRanges("chr3:15-17"))


lapply solution (slow):

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))
overlapsList
# [[1]]
# [1]  TRUE FALSE  TRUE
# [[2]]
# [1] FALSE FALSE  TRUE
# [[3]]
# [1] FALSE FALSE FALSE
# [[4]]
# [1] FALSE  TRUE FALSE


Fast solution (no-loop):

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE
m
#       [,1]  [,2]  [,3]  [,4]
# [1,]  TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE  TRUE
# [3,]  TRUE  TRUE FALSE FALSE


The above matrix has 1 row per range in the GRanges object and 1 column per list element in the GRangesList object.

To get the result as a list of indices into the query:

as(t(hits), "List")
# IntegerList of length 4
# [[1]] 1 3
# [[2]] 3
# [[3]] integer(0)
# [[4]] 2


Hopefully this time I got it right...

Cheers,

H.

PS: I edited my initial answer above accordingly.

Nice solution. This could be implemented as a method on an adjacencyMatrix() generic, which could be promoted from graph, or graph could enhance S4Vectors.

Nice! I was already playing around with findOverlaps, but this is so much faster!

0
2.4 years ago by
United States
Michael Lawrence10k wrote:

Since your GRangesList is short (relative to something like a transcriptome), it is probably much faster to use a SimpleGenomicRangesList instead of GRangesList (a compressed list). See ?GenomicRangesList.