Compare ranges from two GRanges and check if the ranges from the first GRanges is contained in another.
1
0
Entering edit mode
omarrafiqued ▴ 50
@omarrafiqued-21833
Last seen 9 months ago
India

I have the following ranges as GRanges data {Ranges_1}: 1-2, 2-4, 3-6, 6-8, 2-8, 8-11.

{Ranges_2} 1-5, 6-10, 11-15,

I need a mapping from Ranges1 to Ranges2. Something like following (1-2) -> (1-5), (2-4) -> (1-5), (3-6) -> (1-5) & (6-10), (5-7) -> (1-5) & (6-10), (6-8) -> (6-10), (8-11) -> (6-10) & (11-15).

Could someone please help me with the R code for this?

GRanges IRanges • 2.5k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
> gr1 <- GRanges(rep("chr1", 6), IRanges(c(1,2,3,6,2,8), c(2,4,6,8,8,11)))
> gr2 <- GRanges(rep("chr1", 3), IRanges(c(1,6,11), c(5,10,15)))

> fo <- findOverlaps(gr1, gr2)
> fo
Hits object with 9 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         2           1
  [3]         3           1
  [4]         3           2
  [5]         4           2
  [6]         5           1
  [7]         5           2
  [8]         6           2
  [9]         6           3
  -------
  queryLength: 6 / subjectLength: 3
> DataFrame(gr1 = gr1[queryHits(fo)], gr2 = gr2[subjectHits(fo)])
DataFrame with 9 rows and 2 columns
        gr1        gr2
  <GRanges>  <GRanges>
1  chr1:1-2   chr1:1-5
2  chr1:2-4   chr1:1-5
3  chr1:3-6   chr1:1-5
4  chr1:3-6  chr1:6-10
5  chr1:6-8  chr1:6-10
6  chr1:2-8   chr1:1-5
7  chr1:2-8  chr1:6-10
8 chr1:8-11  chr1:6-10
9 chr1:8-11 chr1:11-15

ADD COMMENT
1
Entering edit mode

findOverlapPairs() is also relevant.

ADD REPLY
0
Entering edit mode

What if I dont want one-to-many mapping. e.g., Above there is:

3  chr1:3-6   chr1:1-5
4  chr1:3-6  chr1:6-10

Since, chr1:3-6 lies more in chr1:1-5, so , I would not like 4 chr1:3-6 chr1:6-10 in the result. Is there any elegant way of doing it?

ADD REPLY
1
Entering edit mode

You'll have to wait for Michael to respond if you want elegant (see above). If you want super hackification, I'm your man.

> fo <- findOverlaps(gr1, gr2)
> pol <- findOverlapPairs(gr1, gr2)
> lst <- split(data.frame(width = width(pintersect(pol)), ind = 1:9), queryHits(fo), drop = FALSE)
> pol[sapply(lst, function(x) x[which.max(x$width),"ind"]),]
Pairs object with 6 pairs and 0 metadata columns:
          first    second
      <GRanges> <GRanges>
  [1]  chr1:1-2  chr1:1-5
  [2]  chr1:2-4  chr1:1-5
  [3]  chr1:3-6  chr1:1-5
  [4]  chr1:6-8 chr1:6-10
  [5]  chr1:2-8  chr1:1-5
  [6] chr1:8-11 chr1:6-10
ADD REPLY
1
Entering edit mode

It's possible to only find the overlaps once:

fo <- findOverlaps(gr1, gr2)
pol <- Pairs(gr1, gr2, hits=fo)

Also you could find the pairs with the most overlap doing this

pol[which.max(relist(width(pintersect(pol)), as(fo, "List")), global=TRUE)]

It may be more intuitive to use plyranges, but not sure how to do it.

ADD REPLY
0
Entering edit mode

One more modification:

How to drop a range which belongs to the first GRanges and does not map to any range in the second GRanges.

For such case I want the mapping to be something like:

[x]  chr1:a-b  NA
ADD REPLY
1
Entering edit mode

I think that would have to be separate and Michael's more elegant method won't work?

> gr1 <- GRanges(rep("chr1", 7), IRanges(c(1,2,3,6,2,8,20), c(2,4,6,8,8,11,42)))
> gr2 <- GRanges(rep("chr1", 3), IRanges(c(1,6,11), c(5,10,15)))
> ind <- ! gr1 %over% gr2
> fo <- findOverlaps(gr1, gr2)
> pol <- Pairs(gr1, gr2, hits = fo)
> pol[which.max(relist(width(pintersect(pol)), as(fo, "List")), global=TRUE)]
Error: subscript contains NAs
## Urgh
> lst <- split(data.frame(width = width(pintersect(pol)), ind = 1:9), queryHits(fo), drop = FALSE)
> pol[sapply(lst, function(x) x[which.max(x$width),"ind"]),]
Pairs object with 6 pairs and 0 metadata columns:
          first    second
      <GRanges> <GRanges>
  [1]  chr1:1-2  chr1:1-5
  [2]  chr1:2-4  chr1:1-5
  [3]  chr1:3-6  chr1:1-5
  [4]  chr1:6-8 chr1:6-10
  [5]  chr1:2-8  chr1:1-5
  [6] chr1:8-11 chr1:6-10
> DataFrame(first = gr1[ind], second = rep("NA", sum(ind)))
DataFrame with 1 row and 2 columns
       first      second
   <GRanges> <character>
1 chr1:20-42          NA

ADD REPLY
0
Entering edit mode

Getting an error:

Warning message: In .Seqinfo.mergexy(x, y) : The 2 combined objects have no sequence levels in common. (Use suppressWarnings() to suppress this warning.)

I know this is a warning, but the output is empty.

ADD REPLY
1
Entering edit mode

Modifying my original solution, you can exclude NA cases and subset the Hits object. Since Hits tracks the "universe" of the original input, like a factor, you can pass it to a HelloRanges utility to perform the left outer join and generate the closest thing to a NA value.

fo <- findOverlaps(gr1, gr2)
fo <- fo[na.omit(which.max(relist(width(pintersect(pol)), as(fo, "List")), global=TRUE))]
HelloRanges::pair(gr1, gr2, fo, all.x=TRUE)
ADD REPLY
0
Entering edit mode

By elegant, I mean something that works and is easy to grasp ans you did that exactly. Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6