library(GenomicRanges)
gr <-
GRanges(
Rle(c('chr17', 'chr5')),
IRanges(c(7573997,112173451),c(7573997,112173451))
)
If I want to get the hg19 transcripts associated with these ranges I can do for instance:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
tx <- TxDb.Hsapiens.UCSC.hg19.knownGene
transcriptsByOverlaps(tx, gr, columns=c("tx_name", "CDSID"))
GRanges object with 17 ranges and 2 metadata columns:
seqnames ranges strand | tx_name CDSID
<Rle> <IRanges> <Rle> | <character> <IntegerList>
[1] chr5 [112043202, 112181936] + | uc011cvt.2 65327,65329,65330,...
[2] chr5 [112073556, 112181936] + | uc003kpy.4 NA,65328,65329,...
[3] chr5 [112073556, 112181936] + | uc003kpz.4 NA,NA,65328,...
[4] chr5 [112073556, 112181936] + | uc010jbz.3 NA,NA,NA,...
[5] chr5 [112157593, 112181936] + | uc010jca.3 NA,NA,65344
... ... ... ... ... ... ...
[13] chr17 [7571720, 7590868] - | uc002gij.3 NA,NA,NA,...
[14] chr17 [7571720, 7590868] - | uc002gim.3 NA,183308,183307,...
[15] chr17 [7571720, 7590868] - | uc010cnh.2 NA,183308,183307,...
[16] chr17 [7571720, 7590868] - | uc010cni.2 NA,183308,183307,...
[17] chr17 [7571720, 7590868] - | uc031qyq.1 NA,NA,183305,...
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
But I can not figure out how to get the amino acids codons at precisely these positions?

Couple of minor points:
seqnames(my_hits)instead of coercing to a data.frame?select()accept an Rle so the character coercion is unnecessary.data.frame(Final, annotation, codon_no, aa), which is perhaps easier to read, but you might consider adding the last three columns to themcols(Final), so that the GRanges is preserved. You can always convert to data.frame later when absolutely needed.Sure - thanks for the comments (many of the Ranges methods are still somewhat new to me and I need to get used using them).
Thanks so much for this extensive reply.
For people continuing to struggle with this topic, the
ensembldbpackage has some nice functionality for this:To comment on the above, I believe that the transcript is represented in
cdsas aGRangesListobject in which eachGRangesis composed of a set of ranges corresponding to the transcript exons. In that case, I think that the base pair positions identified inmy_hitswould be exon-relative positions, which would need to be assembled and translated into transcript-relative positions if the transcript has more than one exon.