Question: Intersecting a Granges and a GRangesList elementwise
0
gravatar for d r
4.3 years ago by
d r150
Israel
d r150 wrote:

Hello

I have a Granges object (gr) and a GRangesList (grlist) of the same length. I want to intersect them element-wise, that is, I want to know if any element in grlist[[i]] is (fully) contained in gr[i].

I can of course do something of the loop variant:

intersect<-lapply(seq(length(gr)), function(i) overlapsAny(gr[i],grlist[[i]]).

Is there a better way to do this that I'm missing? mapply would have been useful

here but as far as I know it doesn't accept GRanges.

If it's of any relevance, the members of grlist are exons of genes. I know (from a previous step in my pipeline) that gr[i] partially overlaps the gene whose exons are represented by grlist[[i]], and I ant to know if there is at least one exon contained in it.

Example:

   gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
    gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
    gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
    grlist<-GRangesList(gr1,gr2)
    Now, gr1[1] is within gr[1], hence there is a range in grlist[[1]] that is within gr[1] so I want the function to return TRUE for gr[1].
    However, none of the ranges in gr2 (and hence grlist[[2]]) are within gr[2], thus I want the function to return FALSE for gr[2].
 

Thanks in advacne

Dolev Rahat

genomicranges findoverlaps • 932 views
ADD COMMENTlink modified 4.3 years ago by Hervé Pagès ♦♦ 14k • written 4.3 years ago by d r150
Answer: Intersecting a Granges and a GRangesList elementwise
2
gravatar for Hervé Pagès
4.3 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi,

To use mapply() on a GRanges object you have to coerce it to a GRangesList object first:

as(gr, "GRangesList")
# GRangesList object of length 2:
# [[1]] 
# GRanges object with 1 range and 0 metadata columns:
#       seqnames    ranges strand
#          <Rle> <IRanges>  <Rle>
#   [1]     chr1  [10, 50]      *
# 
# [[2]] 
# GRanges object with 1 range and 0 metadata columns:
#       seqnames    ranges strand
#   [1]     chr2 [20, 100]      *
# 
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths

This coercion returns a GRangesList object parallel to the GRanges object (i.e. with 1 list element per range in the GRanges object).

Then for your use case:

mapply(function(gene, rg) any(gene %within% rg), grlist, as(gr, "GRangesList"))

H.

ADD COMMENTlink written 4.3 years ago by Hervé Pagès ♦♦ 14k

Thanks! I did not know that mapply() works on GRangesList. Good to know.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by d r150

FWIW mapply() works on anything that supports [[ (double-bracket subsetting). Right now [[ doesn't work on a GRanges but coercing the GRanges to GRangesList is a workaround to make [[ work on it:

gr[[i]]                      # Doesn't work.
as(gr, "GRangesList")[[i]]   # Works and does what gr[[i]] would do it
                             # was working.

H.

ADD REPLYlink written 4.3 years ago by Hervé Pagès ♦♦ 14k
Answer: Intersecting a Granges and a GRangesList elementwise
1
gravatar for Michael Lawrence
4.3 years ago by
United States
Michael Lawrence11k wrote:

There are element-wise "set" operations that start with "p". To detect "within", we need pintersect. If we can assume that the ranges inside of each GRangesList element are non-adjacent and non-overlapping, then we can check that the width of that range is equal to the width of the query.

int <- pintersect(grlist, gr)
sum(width(int)) == width(gr)

Recall that the above assumes the ranges inside the GRangesList elements are not overlapping nor adjacent. If they can overlap, then we need to flatten the GRangesList (grlist), align the GRanges (gr) to it, perform the vectorized comparison, and aggregate. This should be faster than direct use of mapply, but any is not as efficient as it could be.

grl_flat <- unlist(grlist, use.names=FALSE)
gr_rep <- gr[togroup(grl_flat)]
int <- pintersect(grl_flat, gr_rep)
any(relist(width(gr) == width(int), grlist))

 

 

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Michael Lawrence11k

@Michael Lawerece Thanks for your reply. However,I think neither of your suggestions does what I want.
What I want is to know, for a given i if there is k such that grlist[[i]][k] is within gr[i].
For example:
gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)
Now, gr1[1] is within gr[1], hence there is a range in grlist[[1]] that is within gr[1] so I want the function to return TRUE for gr[1].
However, none of the ranges in gr2 (and hence grlist[[2]]) are within gr[2], thus I want the function to return FALSE for gr[2].
Can this be done?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by d r150

I edited my answer now that I understand what you want.

ADD REPLYlink written 4.3 years ago by Michael Lawrence11k

I edited my answer now that I understand what you want.

ADD REPLYlink written 4.3 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 358 users visited in the last hour