Entering edit mode
M.Boetzer@lumc.nl
▴
20
@mboetzerlumcnl-2807
Last seen 10.1 years ago
Dear list,
i have a single region with a start and an end, where start < end. I
want to find regions that have an overlap of more than 50% with that
region. The regions to compare with are within a dataframe with starts
and ends positions:
start = 133375983
end = 146245512
data = data.frame(c(133470532, 133966699, 134162735, 134236863,
146225580), c(133754071, 133969713, 134163857, 134249655,156245512))
colnames(data) = c("start2", "end2")
> data
start2 end2
1 133470532 133754071
2 133966699 133969713
3 134162735 134163857
4 134236863 134249655
5 146225580 156245512
I've already made some code which did the trick, however, when the
size of reg1 becomes very large, it will really slow down:
regfound = c()
reg1 = seq(start, end, 1)
for(i in 1:nrow(data)){
eq_reg = sum(is.element(seq(data$start2[i], data$end2[i], 1),
reg1)==T)
if(eq_reg!=0)
regfound = c(regfound,
round(eq_reg/((data$end2[i]-data$start2[i])+1)*100,1))
else
regfound = c(regfound,F)
}
>regfound
[1] 100.0 100.0 100.0 100.0 0.2
Does anyone know a faster or more elegant way of doing this?
Thanks in advance,
Marten
[[alternative HTML version deleted]]