Search
Question: Intersecting a Granges and a GRangesList using within
0
2.9 years ago by
d r150
Israel
d r150 wrote:

Hello, I have a GRanges (gr) object and a GRangesList (grlist).

I want to find elements in grlist that fully overlap elements of gr.

That is, I want to know, for a given i and j if there is k such that

grlist[j][[k]] is fully contained within gr[i]. It is important

that the values of both i and j can be extracted from the function (k is not important for me).

For example, if I have

gr<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(50,100))
gr1<-GRanges(seqnames=c('chr1','chr2'), ranges=IRanges(start=c(10,20),end=(500,1000))
gr2<-GRanges(seqnames='chr1','chr4' ranges=IRanges(start=c(10,1000),end=c(10000,4000))
grlist<-GRangesList(gr1,gr2)

gr[1][1] is within gr[1]. However no element of gr2 is fully contained in gr[1].

Further, no element of either gr1 or gr2 is fully contained in gr[2].

So I would the function call f(gr,grlist) to return (TRUE,FALSE,FALSE,FALSE)

I tried using:

flat<-unlist(grlist,use.names=FALSE)
g<-relist(flat%within%gr, grlist)

But this only tells me for an element of grlist if it is contained in any element of gr.

Is there any way to extract which which element of gr contains it?

Note: I did ask a similar question (https://support.bioconductor.org/p/71619), but on an afterthought decided that mapply() style comparison does not suit my needs (I need to compare every i to every j, not just j=i). Sorry for the confusion.

Dolev Rahat

modified 2.8 years ago by Hervé Pagès ♦♦ 13k • written 2.9 years ago by d r150
2
2.9 years ago by
Valerie Obenchain ♦♦ 6.5k
United States
Valerie Obenchain ♦♦ 6.5k wrote:

Hi,

In your example, both ranges of 'gr' are contained within ranges of 'grlist'.
None of the ranges in 'grlist' are within the ranges in 'gr'.

gr <- GRanges(c('chr1','chr2'), IRanges(c(10,20), end=c(50,100)))
gr1 <- GRanges(c('chr1','chr2'), IRanges(c(10,20), end=c(500,1000)))
gr2 <- GRanges(c('chr1','chr4'), IRanges(c(10,1000), end=c(10000,4000)))
grlist <- GRangesList(gr1,gr2)

Use findOverlaps() to identify 'gr' ranges that fall "within" 'grlist'. The return value is a 'Hits' object with information for all range combinations. This way you don't need to call findOverlaps() for each range in 'gr'.

flat <- unlist(grlist, use.names=FALSE)
fo <- findOverlaps(gr, flat, type = "within")
> fo
Hits object with 3 hits and 0 metadata columns:
queryHits subjectHits
<integer>   <integer>
[1]         1           1
[2]         1           3
[3]         2           2
-------
queryLength: 2
subjectLength: 4

Next, separate the information by range in 'gr'. Note that the output will be the same length as the number of ranges in 'gr' that had hits - this is not necessarily the same length as 'gr'.

> lapply(lst, function(i) {
+     vec <- logical(subjectLength(fo))
+     vec[i]  <- TRUE
+     vec
+ })
$1 [1] TRUE FALSE TRUE FALSE$2
[1] FALSE  TRUE FALSE FALSE

If you want 'vec' as a list of the same geometry as 'grlist' change the last line in the lapply to relist(vec, grlist).

Extracting the relevant ranges of 'grlist' can be done with the Hits object, you don't need to create your own logical vector.

> flat[subjectHits(fo)]
GRanges object with 3 ranges and 0 metadata columns:
seqnames      ranges strand
<Rle>   <IRanges>  <Rle>
[1]     chr1 [10,   500]      *
[2]     chr1 [10, 10000]      *
[3]     chr2 [20,  1000]      *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths

I think this is what you wanted. Let me know if I misunderstood and we can try
again.

Valerie

@Valerie Obenchain. Thanks for your reply. The output in your example seems what I want. However, In your call to lapply(), you iterate on an object called lst? Can you please tell me what it this object. I did not see it defined in your reply. Thanks again.

Oh my, I completely missed that line when I pasted. Sorry about that. The missing 'lst' is a split of the subject hits (ranges in 'flat') by the query hits (ranges in 'gr').

> lst <- split(subjectHits(fo), queryHits(fo))
> lst
$1 [1] 1 3$2
[1] 2 

Valerie

Thanks! just what I needed.

0
2.8 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

FWIW I'd like to mention that type="within" is supported between a GRangesList query and a GRangesList subject, and in that case a hit between query[i] and subject[j] is reported only if all the ranges in the former are within a range in the latter. So in your case, to find the elements in grlist that fully contain gr (if I understand correctly what you are trying to do) you can put gr inside a GRangesList of length 1:

subjectHits(findOverlaps(GRangesList(gr), grlist, type="within"))

H.