Dear BioConductors,
I have two GRanges objects, one of which contains several sequences (let's say motifs) and the other is a single bp (let's say it's a variant):
library(GenomicRanges) variant.position <- 63552595 variant.GR <- GRanges("chr17", IRanges(variant.position, width=1), "+") motifs.GR <- GRanges("chr17", IRanges(c(63552587, 63552589), width = 10), c("+"), motif=c("MYC", "TCF4"))
I'd like to get the relative position of the variant inside the motifs that it matches. I know I can use findOverlaps and then subtract the start position of the motif from the variant position:
hits <- findOverlaps(variant.GR, motifs.GR) relative.position <- variant.position - start(motifs.GR[subjectHits(hits)]) + 1 names(relative.position) <- motifs.GR[subjectHits(hits)]$motif
But I'm aware that this approach is using three steps, i.e. subsetting the GRanges object, extracting the start position and subtracting it from the variant position. Besides, I will have to reverse the calculation for the motifs on the negative strand. Given that I'm likely to run it on millions of variants and hundreds of thousands of motifs, I'm concerned about the computational overheads and wondering if there's a more efficient way to get that relative position. Ideally, the function would take the above objects and return the index of the motif containing the variant and the position of the variant inside the motif, ideally, in a strand-specific manner.
Many thanks!
It seems to be just what I needed. Thanks a lot!