Dear All, I am trying to extract overlapping pairs from two Granges objects. Here, I extract all the pairs which overlap either with the start or with the end of the gene. Then i extract the names of the overlapping range from their respective dataframe by merging the start and end positions.
overlaped_pairs <- findOverlapPairs(gr3, gr1, type="equal")
# find overlapping range names 2
names2 <- merge(overlaped_pairs@second, df2, by = c("start","end"))
# find overlapping range names 1
names1<- merge(overlaped_pairs@first, df1, by = c("start","end"))
However, what I would like to have is to find overlaps that are exactly at 10kb distance, so I ran the following code :
overlaped_pairs10kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 10000)
# find overlapping range names 2
names2_10kb <- merge(overlaped_pairs10kb@second, df2, by = c("start","end"))
# find overlapping range names 1
names1_10kb <- merge(overlaped_pairs10kb@first, df1, by = c("start","end"))
But this results in pairs that have overlap with either the start or end of the gene upto 10000 bps. However, I want them to have exact 10,000 bp distance.
So the questions are as follows :
- Is it right to consider maxgap in basepairs?
- Is there a way to find pairs at exact distance?
- Are there any other packages that could help me with this?
Another idea would be to use
# calculate larger maxgap
overlaped_pairs50kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 50000)
#Calculate the distances between the start and end positions
second <- overlaped_pairs50kb@second@ranges
second_df <- data.frame(second)
first <- overlaped_pairs50kb@first@unlistData@ranges
first_df <- data.frame(first)
#Calculate the distances between the start and end positions
start_distance <- abs(first_df$start - second_df$start)
end_distance <- abs(first_df$end - second_df$end)
# Check if the start or end distance is exactly 10000
exact_distance_pairs_first <- first_df[(start_distance == 10000 | end_distance == 10000),]
exact_distance_pairs_second <- second_df[(start_distance == 10000 | end_distance == 10000),]
I was wondering if there is a better solution for this, and I would really appreciate some help!
Thanks and best, Shweta