Search
Question: overlapsAny for elements in GRangesList
0
gravatar for maltethodberg
23 months ago by
UCPH
maltethodberg50 wrote:

I have a GRanges and a GRangesList object, i.e.:

gr # GRanges

grList # GRangesList

I wish to know what ranges in the GRanges are in each of the ranges in each element in the list, so I thought of calling overlapsAny on each element of the GRangesList like so:

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))

This gives me the desired output, however this is very slow!

Is there a faster of doing the same thing?

ADD COMMENTlink modified 23 months ago by Michael Lawrence10.0k • written 23 months ago by maltethodberg50

Perhaps you could describe your high-level use case / problem?

ADD REPLYlink written 23 months ago by Michael Lawrence10.0k

I'm doing some enrichment analysis, so I need to annotate a "universe" of ranges with a wide range of other annotations.

ADD REPLYlink written 23 months ago by maltethodberg50
4
gravatar for Hervé Pagès
23 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

Edit: This solution is incorrect.

This is a situation where you can avoid the lapply loop by using the unlist/relist idiom:

unlisted_grList <- unlist(grList, use.names=FALSE)
relist(overlapsAny(query=unlisted_grList, subject=gr), grList)

Will be 100x faster or more than using lapply().

Correct solution:

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE

See comments below for the details.

Cheers,

H.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Hervé Pagès ♦♦ 13k

Is that equivalent to the code in the question? He wants the query to be gr, and the subject the grList element.

ADD REPLYlink written 23 months ago by Michael Lawrence10.0k

Correct, this is not equivalent to the original example, here you overlap each element of the GRlist with GR, rather than the other way around.

ADD REPLYlink written 23 months ago by maltethodberg50
2

Oops, you guys are right, my bad. Here is my 2nd attempt. The unlist/relist trick is actually not needed:

library(GenomicRanges)
gr <- GRanges(c("chr1:9-14", "chr3:11-16", "chr1:4-5"))
grList <- GRangesList(GRanges("chr1:5-10"), 
                      GRanges(c("chr2:1-4", "chr1:4-5")),
                      GRanges(),
                      GRanges("chr3:15-17"))

lapply solution (slow):

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))
overlapsList
# [[1]]
# [1]  TRUE FALSE  TRUE
# [[2]]
# [1] FALSE FALSE  TRUE
# [[3]]
# [1] FALSE FALSE FALSE
# [[4]]
# [1] FALSE  TRUE FALSE

Fast solution (no-loop):

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE
m
#       [,1]  [,2]  [,3]  [,4]
# [1,]  TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE  TRUE
# [3,]  TRUE  TRUE FALSE FALSE

The above matrix has 1 row per range in the GRanges object and 1 column per list element in the GRangesList object.

To get the result as a list of indices into the query:

as(t(hits), "List")
# IntegerList of length 4
# [[1]] 1 3
# [[2]] 3
# [[3]] integer(0)
# [[4]] 2

Hopefully this time I got it right...

Cheers,

H.

PS: I edited my initial answer above accordingly.

ADD REPLYlink modified 23 months ago • written 23 months ago by Hervé Pagès ♦♦ 13k

Nice solution. This could be implemented as a method on an adjacencyMatrix() generic, which could be promoted from graph, or graph could enhance S4Vectors.

ADD REPLYlink written 23 months ago by Michael Lawrence10.0k

Nice! I was already playing around with findOverlaps, but this is so much faster!

ADD REPLYlink written 23 months ago by maltethodberg50
0
gravatar for Michael Lawrence
23 months ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

Since your GRangesList is short (relative to something like a transcriptome), it is probably much faster to use a SimpleGenomicRangesList instead of GRangesList (a compressed list). See `?GenomicRangesList`.

 

ADD COMMENTlink written 23 months ago by Michael Lawrence10.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 387 users visited in the last hour