Question

Intersecting a Granges and a GRangesList using within

0

Entering edit mode

d r ▴ 150

@d-r-5459

Last seen 6.2 years ago

Israel

Hello, I have a GRanges (gr) object and a GRangesList (grlist).

I want to find elements in grlist that fully overlap elements of gr.

That is, I want to know, for a given i and j if there is k such that

grlist[j][[k]] is fully contained within gr[i]. It is important

that the values of both i and j can be extracted from the function (k is not important for me).

For example, if I have

gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)

gr[1][1] is within gr[1]. However no element of gr2 is fully contained in gr[1].

Further, no element of either gr1 or gr2 is fully contained in gr[2].

So I would the function call f(gr,grlist) to return (TRUE,FALSE,FALSE,FALSE)

I tried using:

flat<-unlist(grlist,use.names=FALSE)
g<-relist(flat%within%gr, grlist)

But this only tells me for an element of grlist if it is contained in any element of gr.

Is there any way to extract which which element of gr contains it?

Note: I did ask a similar question (https://support.bioconductor.org/p/71619), but on an afterthought decided that mapply() style comparison does not suit my needs (I need to compare every i to every j, not just j=i). Sorry for the confusion.

Thanks in advance

Dolev Rahat

granges grangeslist findoverlaps • 2.1k views

ADD COMMENT • link updated 8.6 years ago by Hervé Pagès 16k • written 8.6 years ago by d r ▴ 150

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 7 hours ago

Seattle, WA, United States

Hi,

FWIW I'd like to mention that type="within" is supported between a GRangesList query and a GRangesList subject, and in that case a hit between query[i] and subject[j] is reported only if all the ranges in the former are within a range in the latter. So in your case, to find the elements in grlist that fully contain gr (if I understand correctly what you are trying to do) you can put gr inside a GRangesList of length 1:

subjectHits(findOverlaps(GRangesList(gr), grlist, type="within"))

H.

ADD COMMENT • link 8.6 years ago Hervé Pagès 16k

score 2 · Accepted Answer · 2015-09-10

Hi,

In your example, both ranges of 'gr' are contained within ranges of 'grlist'.
None of the ranges in 'grlist' are within the ranges in 'gr'.

gr <- GRanges(c('chr1','chr2'), IRanges(c(10,20), end=c(50,100)))
gr1 <- GRanges(c('chr1','chr2'), IRanges(c(10,20), end=c(500,1000)))
gr2 <- GRanges(c('chr1','chr4'), IRanges(c(10,1000), end=c(10000,4000)))
grlist <- GRangesList(gr1,gr2)

Use findOverlaps() to identify 'gr' ranges that fall "within" 'grlist'. The return value is a 'Hits' object with information for all range combinations. This way you don't need to call findOverlaps() for each range in 'gr'.

flat <- unlist(grlist, use.names=FALSE)
fo <- findOverlaps(gr, flat, type = "within")
> fo
Hits object with 3 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           3
  [3]         2           2
  -------
  queryLength: 2
  subjectLength: 4

Next, separate the information by range in 'gr'. Note that the output will be the same length as the number of ranges in 'gr' that had hits - this is not necessarily the same length as 'gr'.

> lapply(lst, function(i) {
+     vec <- logical(subjectLength(fo))
+     vec[i]  <- TRUE
+     vec
+ })
$`1`
[1]  TRUE FALSE  TRUE FALSE

$`2`
[1] FALSE  TRUE FALSE FALSE

If you want 'vec' as a list of the same geometry as 'grlist' change the last line in the lapply to relist(vec, grlist).

Extracting the relevant ranges of 'grlist' can be done with the Hits object, you don't need to create your own logical vector.

> flat[subjectHits(fo)]
GRanges object with 3 ranges and 0 metadata columns:
      seqnames      ranges strand
         <Rle>   <IRanges>  <Rle>
  [1]     chr1 [10,   500]      *
  [2]     chr1 [10, 10000]      *
  [3]     chr2 [20,  1000]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

I think this is what you wanted. Let me know if I misunderstood and we can try
again.

Valerie