Question: Intersecting a Granges and a GRangesList elementwise
0
4.3 years ago by
d r150
Israel
d r150 wrote:

Hello

I have a Granges object (gr) and a GRangesList (grlist) of the same length. I want to intersect them element-wise, that is, I want to know if any element in grlist[[i]] is (fully) contained in gr[i].

I can of course do something of the loop variant:

intersect<-lapply(seq(length(gr)), function(i) overlapsAny(gr[i],grlist[[i]]).

Is there a better way to do this that I'm missing? mapply would have been useful

here but as far as I know it doesn't accept GRanges.

If it's of any relevance, the members of grlist are exons of genes. I know (from a previous step in my pipeline) that gr[i] partially overlaps the gene whose exons are represented by grlist[[i]], and I ant to know if there is at least one exon contained in it.

Example:

gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)
Now, gr1[1] is within gr[1], hence there is a range in grlist[[1]] that is within gr[1] so I want the function to return TRUE for gr[1].
However, none of the ranges in gr2 (and hence grlist[[2]]) are within gr[2], thus I want the function to return FALSE for gr[2].

Dolev Rahat

genomicranges findoverlaps • 932 views
modified 4.3 years ago by Hervé Pagès ♦♦ 14k • written 4.3 years ago by d r150
Answer: Intersecting a Granges and a GRangesList elementwise
2
4.3 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi,

To use mapply() on a GRanges object you have to coerce it to a GRangesList object first:

as(gr, "GRangesList")
# GRangesList object of length 2:
# [[1]]
# GRanges object with 1 range and 0 metadata columns:
#       seqnames    ranges strand
#          <Rle> <IRanges>  <Rle>
#   [1]     chr1  [10, 50]      *
#
# [[2]]
# GRanges object with 1 range and 0 metadata columns:
#       seqnames    ranges strand
#   [1]     chr2 [20, 100]      *
#
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths


This coercion returns a GRangesList object parallel to the GRanges object (i.e. with 1 list element per range in the GRanges object).

mapply(function(gene, rg) any(gene %within% rg), grlist, as(gr, "GRangesList"))

H.

Thanks! I did not know that mapply() works on GRangesList. Good to know.

FWIW mapply() works on anything that supports [[ (double-bracket subsetting). Right now [[ doesn't work on a GRanges but coercing the GRanges to GRangesList is a workaround to make [[ work on it:

gr[[i]]                      # Doesn't work.
as(gr, "GRangesList")[[i]]   # Works and does what gr[[i]] would do it
# was working.

H.

Answer: Intersecting a Granges and a GRangesList elementwise
1
4.3 years ago by
United States
Michael Lawrence11k wrote:

There are element-wise "set" operations that start with "p". To detect "within", we need pintersect. If we can assume that the ranges inside of each GRangesList element are non-adjacent and non-overlapping, then we can check that the width of that range is equal to the width of the query.

int <- pintersect(grlist, gr)
sum(width(int)) == width(gr)

Recall that the above assumes the ranges inside the GRangesList elements are not overlapping nor adjacent. If they can overlap, then we need to flatten the GRangesList (grlist), align the GRanges (gr) to it, perform the vectorized comparison, and aggregate. This should be faster than direct use of mapply, but any is not as efficient as it could be.

grl_flat <- unlist(grlist, use.names=FALSE)
gr_rep <- gr[togroup(grl_flat)]
int <- pintersect(grl_flat, gr_rep)
any(relist(width(gr) == width(int), grlist))

What I want is to know, for a given i if there is k such that grlist[[i]][k] is within gr[i].
For example:
gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)
Now, gr1[1] is within gr[1], hence there is a range in grlist[[1]] that is within gr[1] so I want the function to return TRUE for gr[1].
However, none of the ranges in gr2 (and hence grlist[[2]]) are within gr[2], thus I want the function to return FALSE for gr[2].
Can this be done?

I edited my answer now that I understand what you want.