Hi,
I have two sets of Genomic Ranges which I'm intersecting using the findOverlaps of the GenomicRanges package:
df1 <- data.frame(chr=rep("chr1",6), start=c(10033259,10060726,98674166,10067579,10067607,11169988), end=c(10033289,10060783,98674223,10067654,10067664,11170044), strand=c("-","-","+","+","+","+"))
df2 <- data.frame(chr=rep("chr1",3),start=c(10024601,10033258,10033258),end=c(10038168,10033323,10033323),strand=c("-","-","-"))
df1.gr <- makeGRangesFromDataFrame(df1,seqnames.field="chr",start.field="start",end.field="end",strand.field="strand")
df2.gr <- makeGRangesFromDataFrame(df2,seqnames.field="chr",start.field="start",end.field="end",strand.field="strand") dfs.ol <- findOverlapsdf1.gr,df2.gr)
My question is how to extract the actual overlapping coordinates of each of the hits in the returned value of findOverlaps
(dfs.ol
)?
I know that the intersect
function returns the collapsed intervals in the query genomic ranges which intersect with a search genomic ranges. But what I really need for each overlap between gr1 and gr2 are the coordinates of the overlap, in addition to the indices of the genomic ranges which overlap (in the returned Hits object).
Does that report the overlap interval though?
Yes, it generates the overlapping interval for each row (a query/subject pair) in your Hits object.
In the overlaps.gr object?
I only see the indices of the query and hit but not the overlap's coordinates. How do you extract that?
As the overlaps.gr is a GRanges object, you can use start(), end() and seqnames() to extract the coordinates of the overlapping intervals.