Intersecting a Granges and a GRangesList elementwise
2
0
Entering edit mode
d r ▴ 150
@d-r-5459
Last seen 6.9 years ago
Israel

Hello

I have a Granges object (gr) and a GRangesList (grlist) of the same length. I want to intersect them element-wise, that is, I want to know if any element in grlist[[i]] is (fully) contained in gr[i].

I can of course do something of the loop variant:

intersect<-lapply(seq(length(gr)), function(i) overlapsAny(gr[i],grlist[[i]]).

Is there a better way to do this that I'm missing? mapply would have been useful

here but as far as I know it doesn't accept GRanges.

If it's of any relevance, the members of grlist are exons of genes. I know (from a previous step in my pipeline) that gr[i] partially overlaps the gene whose exons are represented by grlist[[i]], and I ant to know if there is at least one exon contained in it.

Example:

   gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
    gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
    gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
    grlist<-GRangesList(gr1,gr2)
    Now, gr1[1] is within gr[1], hence there is a range in grlist[[1]] that is within gr[1] so I want the function to return TRUE for gr[1].
    However, none of the ranges in gr2 (and hence grlist[[2]]) are within gr[2], thus I want the function to return FALSE for gr[2].
 

Thanks in advacne

Dolev Rahat

findoverlaps genomicranges • 2.3k views
ADD COMMENT
2
Entering edit mode
@herve-pages-1542
Last seen 10 hours ago
Seattle, WA, United States

Hi,

To use mapply() on a GRanges object you have to coerce it to a GRangesList object first:

as(gr, "GRangesList")
# GRangesList object of length 2:
# [[1]] 
# GRanges object with 1 range and 0 metadata columns:
#       seqnames    ranges strand
#          <Rle> <IRanges>  <Rle>
#   [1]     chr1  [10, 50]      *
# 
# [[2]] 
# GRanges object with 1 range and 0 metadata columns:
#       seqnames    ranges strand
#   [1]     chr2 [20, 100]      *
# 
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths

This coercion returns a GRangesList object parallel to the GRanges object (i.e. with 1 list element per range in the GRanges object).

Then for your use case:

mapply(function(gene, rg) any(gene %within% rg), grlist, as(gr, "GRangesList"))

H.

ADD COMMENT
0
Entering edit mode

Thanks! I did not know that mapply() works on GRangesList. Good to know.

ADD REPLY
0
Entering edit mode

FWIW mapply() works on anything that supports [[ (double-bracket subsetting). Right now [[ doesn't work on a GRanges but coercing the GRanges to GRangesList is a workaround to make [[ work on it:

gr[[i]]                      # Doesn't work.
as(gr, "GRangesList")[[i]]   # Works and does what gr[[i]] would do it
                             # was working.

H.

ADD REPLY
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

There are element-wise "set" operations that start with "p". To detect "within", we need pintersect. If we can assume that the ranges inside of each GRangesList element are non-adjacent and non-overlapping, then we can check that the width of that range is equal to the width of the query.

int <- pintersect(grlist, gr)
sum(width(int)) == width(gr)

Recall that the above assumes the ranges inside the GRangesList elements are not overlapping nor adjacent. If they can overlap, then we need to flatten the GRangesList (grlist), align the GRanges (gr) to it, perform the vectorized comparison, and aggregate. This should be faster than direct use of mapply, but any is not as efficient as it could be.

grl_flat <- unlist(grlist, use.names=FALSE)
gr_rep <- gr[togroup(grl_flat)]
int <- pintersect(grl_flat, gr_rep)
any(relist(width(gr) == width(int), grlist))

 

 

 

ADD COMMENT
0
Entering edit mode

@Michael Lawerece Thanks for your reply. However,I think neither of your suggestions does what I want.
What I want is to know, for a given i if there is k such that grlist[[i]][k] is within gr[i].
For example:
gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)
Now, gr1[1] is within gr[1], hence there is a range in grlist[[1]] that is within gr[1] so I want the function to return TRUE for gr[1].
However, none of the ranges in gr2 (and hence grlist[[2]]) are within gr[2], thus I want the function to return FALSE for gr[2].
Can this be done?

ADD REPLY
0
Entering edit mode

I edited my answer now that I understand what you want.

ADD REPLY
0
Entering edit mode

I edited my answer now that I understand what you want.

ADD REPLY

Login before adding your answer.

Traffic: 817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6