Search
Question: overlapsAny for elements in GRangesList
0
gravatar for maltethodberg
18 months ago by
UCPH
maltethodberg40 wrote:

I have a GRanges and a GRangesList object, i.e.:

gr # GRanges

grList # GRangesList

I wish to know what ranges in the GRanges are in each of the ranges in each element in the list, so I thought of calling overlapsAny on each element of the GRangesList like so:

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))

This gives me the desired output, however this is very slow!

Is there a faster of doing the same thing?

ADD COMMENTlink modified 18 months ago by Michael Lawrence9.8k • written 18 months ago by maltethodberg40

Perhaps you could describe your high-level use case / problem?

ADD REPLYlink written 18 months ago by Michael Lawrence9.8k

I'm doing some enrichment analysis, so I need to annotate a "universe" of ranges with a wide range of other annotations.

ADD REPLYlink written 18 months ago by maltethodberg40
4
gravatar for Hervé Pagès
18 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

Edit: This solution is incorrect.

This is a situation where you can avoid the lapply loop by using the unlist/relist idiom:

unlisted_grList <- unlist(grList, use.names=FALSE)
relist(overlapsAny(query=unlisted_grList, subject=gr), grList)

Will be 100x faster or more than using lapply().

Correct solution:

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE

See comments below for the details.

Cheers,

H.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Hervé Pagès ♦♦ 13k

Is that equivalent to the code in the question? He wants the query to be gr, and the subject the grList element.

ADD REPLYlink written 18 months ago by Michael Lawrence9.8k

Correct, this is not equivalent to the original example, here you overlap each element of the GRlist with GR, rather than the other way around.

ADD REPLYlink written 18 months ago by maltethodberg40
2

Oops, you guys are right, my bad. Here is my 2nd attempt. The unlist/relist trick is actually not needed:

library(GenomicRanges)
gr <- GRanges(c("chr1:9-14", "chr3:11-16", "chr1:4-5"))
grList <- GRangesList(GRanges("chr1:5-10"), 
                      GRanges(c("chr2:1-4", "chr1:4-5")),
                      GRanges(),
                      GRanges("chr3:15-17"))

lapply solution (slow):

overlapsList <- lapply(grList, function(x) overlapsAny(query=gr, subject=x))
overlapsList
# [[1]]
# [1]  TRUE FALSE  TRUE
# [[2]]
# [1] FALSE FALSE  TRUE
# [[3]]
# [1] FALSE FALSE FALSE
# [[4]]
# [1] FALSE  TRUE FALSE

Fast solution (no-loop):

hits <- findOverlaps(gr, grList)
m <- matrix(FALSE, nrow=queryLength(hits), ncol=subjectLength(hits))
m[as.matrix(hits)] <- TRUE
m
#       [,1]  [,2]  [,3]  [,4]
# [1,]  TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE  TRUE
# [3,]  TRUE  TRUE FALSE FALSE

The above matrix has 1 row per range in the GRanges object and 1 column per list element in the GRangesList object.

To get the result as a list of indices into the query:

as(t(hits), "List")
# IntegerList of length 4
# [[1]] 1 3
# [[2]] 3
# [[3]] integer(0)
# [[4]] 2

Hopefully this time I got it right...

Cheers,

H.

PS: I edited my initial answer above accordingly.

ADD REPLYlink modified 18 months ago • written 18 months ago by Hervé Pagès ♦♦ 13k

Nice solution. This could be implemented as a method on an adjacencyMatrix() generic, which could be promoted from graph, or graph could enhance S4Vectors.

ADD REPLYlink written 18 months ago by Michael Lawrence9.8k

Nice! I was already playing around with findOverlaps, but this is so much faster!

ADD REPLYlink written 18 months ago by maltethodberg40
0
gravatar for Michael Lawrence
18 months ago by
United States
Michael Lawrence9.8k wrote:

Since your GRangesList is short (relative to something like a transcriptome), it is probably much faster to use a SimpleGenomicRangesList instead of GRangesList (a compressed list). See `?GenomicRangesList`.

 

ADD COMMENTlink written 18 months ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 321 users visited in the last hour