Extract coordinates of overlapping genomic intervals
1
1
Entering edit mode
rubi ▴ 100
@rubi-6462
Last seen 4.0 years ago

Hi,

I have two sets of Genomic Ranges which I'm intersecting using the findOverlaps of the GenomicRanges package:

df1 <- data.frame(chr=rep("chr1",6), start=c(10033259,10060726,98674166,10067579,10067607,11169988), end=c(10033289,10060783,98674223,10067654,10067664,11170044), strand=c("-","-","+","+","+","+"))

df2 <- data.frame(chr=rep("chr1",3),start=c(10024601,10033258,10033258),end=c(10038168,10033323,10033323),strand=c("-","-","-"))

df1.gr <- makeGRangesFromDataFrame(df1,seqnames.field="chr",start.field="start",end.field="end",strand.field="strand")

df2.gr <- makeGRangesFromDataFrame(df2,seqnames.field="chr",start.field="start",end.field="end",strand.field="strand") dfs.ol <- findOverlapsdf1.gr,df2.gr)

My question is how to extract the actual overlapping coordinates of each of the hits in the returned value of findOverlaps (dfs.ol)?

I know that the intersect function returns the collapsed intervals in the query genomic ranges which intersect with a search genomic ranges. But what I really need for each overlap between gr1 and gr2 are the coordinates of the overlap, in addition to the indices of the genomic ranges which overlap (in the returned Hits object).

genomicranges findoverlaps intersect • 1.1k views
2
Entering edit mode
@jeff-johnston-6497
Last seen 4.7 years ago
United States

You can use pintersect:

overlaps.gr <- pintersect(df1.gr[queryHits(dfs.ol)], df2.gr[subjectHits(dfs.ol)])

If you want all the results in one object, you can add the indices as metadata columns:

overlaps.gr$df1_hit <- queryHits(dfs.ol) overlaps.gr$df2_hit <- subjectHits(dfs.ol)


0
Entering edit mode

Does that report the overlap interval though?

0
Entering edit mode

Yes, it generates the overlapping interval for each row (a query/subject pair) in your Hits object.

0
Entering edit mode

In the overlaps.gr object?

I only see the indices of the query and hit but not the overlap's coordinates. How do you extract that?

0
Entering edit mode

As the overlaps.gr is a GRanges object, you can use start(), end() and seqnames() to extract the coordinates of the overlapping intervals.