Using Granges to find overlapping pairs at exactly 10kb distance
0
0
Entering edit mode
gshweta95 • 0
@246a9bf4
Last seen 9 days ago
Germany

Dear All, I am trying to extract overlapping pairs from two Granges objects. Here, I extract all the pairs which overlap either with the start or with the end of the gene. Then i extract the names of the overlapping range from their respective dataframe by merging the start and end positions.


overlaped_pairs <- findOverlapPairs(gr3, gr1, type="equal")

# find overlapping range names 2
names2 <- merge(overlaped_pairs@second, df2, by = c("start","end"))

# find overlapping range names 1
names1<- merge(overlaped_pairs@first, df1, by = c("start","end"))


However, what I would like to have is to find overlaps that are exactly at 10kb distance, so I ran the following code :


overlaped_pairs10kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 10000)

# find overlapping range names 2
names2_10kb <- merge(overlaped_pairs10kb@second, df2, by = c("start","end"))

# find overlapping range names 1
names1_10kb <- merge(overlaped_pairs10kb@first, df1, by = c("start","end"))


But this results in pairs that have overlap with either the start or end of the gene upto 10000 bps. However, I want them to have exact 10,000 bp distance.

So the questions are as follows :

1. Is it right to consider maxgap in basepairs?
2. Is there a way to find pairs at exact distance?
3. Are there any other packages that could help me with this?

Another idea would be to use

# calculate larger maxgap
overlaped_pairs50kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 50000)

#Calculate the distances between the start and end positions
second <- overlaped_pairs50kb@second@ranges
second_df <- data.frame(second)

first <- overlaped_pairs50kb@first@unlistData@ranges
first_df <- data.frame(first)

#Calculate the distances between the start and end positions
start_distance <- abs(first_df$start - second_df$start)
end_distance <- abs(first_df$end - second_df$end)

# Check if the start or end distance is exactly 10000
exact_distance_pairs_first <- first_df[(start_distance == 10000 | end_distance == 10000),]
exact_distance_pairs_second <- second_df[(start_distance == 10000 | end_distance == 10000),]


I was wondering if there is a better solution for this, and I would really appreciate some help!

Thanks and best, Shweta

genomeIntervals GenomicRanges • 90 views