Consider this code:
library(microbenchmark) library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene microbenchmark::microbenchmark( # Option 1: GenomicFeatures::exons( txdb, vals = list(tx_name = c("uc001aaa.3", "uc010nxq.1")), columns = list("EXONNAME", "TXNAME", "GENEID")), # Option 2: GenomicFeatures::exonsBy(txdb, by = "tx", use.names = TRUE)[c("uc001aaa.3", "uc010nxq.1")], times = 10 ) # Option 1 takes an average of 1.6 seconds # Option 2 takes an average of 6.5 seconds
These two options gets me the same data, but option1 is a lot faster. Is there an easy way I can use option1 and still get the structure that is option2? Or is there a way to make exonsBy faster by only extracting the transcripts I'm interested in?
My end goal is to get a GRangesList object with one GRanges object per transcript, OR a single GRanges object with duplicate entries for exons that appear in multiple transcripts (if that's possible). To begin with I only have the transcript names and the TxDb object.