Using Granges to find overlapping pairs at exactly 10kb distance
Entering edit mode
gshweta95 • 0
Last seen 9 days ago

Dear All, I am trying to extract overlapping pairs from two Granges objects. Here, I extract all the pairs which overlap either with the start or with the end of the gene. Then i extract the names of the overlapping range from their respective dataframe by merging the start and end positions.

overlaped_pairs <- findOverlapPairs(gr3, gr1, type="equal")

# find overlapping range names 2
names2 <- merge(overlaped_pairs@second, df2, by = c("start","end"))

# find overlapping range names 1
names1<- merge(overlaped_pairs@first, df1, by = c("start","end"))

However, what I would like to have is to find overlaps that are exactly at 10kb distance, so I ran the following code :

overlaped_pairs10kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 10000)

# find overlapping range names 2
names2_10kb <- merge(overlaped_pairs10kb@second, df2, by = c("start","end"))

# find overlapping range names 1
names1_10kb <- merge(overlaped_pairs10kb@first, df1, by = c("start","end"))

But this results in pairs that have overlap with either the start or end of the gene upto 10000 bps. However, I want them to have exact 10,000 bp distance.

So the questions are as follows :

  1. Is it right to consider maxgap in basepairs?
  2. Is there a way to find pairs at exact distance?
  3. Are there any other packages that could help me with this?

Another idea would be to use

# calculate larger maxgap
overlaped_pairs50kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 50000)

 #Calculate the distances between the start and end positions
second <- overlaped_pairs50kb@second@ranges
second_df <- data.frame(second)

first <- overlaped_pairs50kb@first@unlistData@ranges
first_df <- data.frame(first)

 #Calculate the distances between the start and end positions
start_distance <- abs(first_df$start -   second_df$start)
end_distance <- abs(first_df$end -   second_df$end)

# Check if the start or end distance is exactly 10000
exact_distance_pairs_first <- first_df[(start_distance == 10000 | end_distance == 10000),]
exact_distance_pairs_second <- second_df[(start_distance == 10000 | end_distance == 10000),]

I was wondering if there is a better solution for this, and I would really appreciate some help!

Thanks and best, Shweta

genomeIntervals GenomicRanges • 90 views

Login before adding your answer.

Traffic: 413 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6