Consider this code:
library(microbenchmark) library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene microbenchmark::microbenchmark( # Option 1: GenomicFeatures::exons( txdb, vals = list(tx_name = c("uc001aaa.3", "uc010nxq.1")), columns = list("EXONNAME", "TXNAME", "GENEID")), # Option 2: GenomicFeatures::exonsBy(txdb, by = "tx", use.names = TRUE)[c("uc001aaa.3", "uc010nxq.1")], times = 10 ) # Option 1 takes an average of 1.6 seconds # Option 2 takes an average of 6.5 seconds
These two options gets me the same data, but option1 is a lot faster. Is there an easy way I can use option1 and still get the structure that is option2? Or is there a way to make exonsBy faster by only extracting the transcripts I'm interested in?
My end goal is to get a GRangesList object with one GRanges object per transcript, OR a single GRanges object with duplicate entries for exons that appear in multiple transcripts (if that's possible). To begin with I only have the transcript names and the TxDb object.
Thanks for answering. I started working on an example to help explain what I want to accomplish:
The goal again is to get a structure similar to the one I get with exonsBy() -> unlist(). Any ideas?
I agree that the TxDb accessors have major usability issues. Probably my least favorite thing in Bioconductor. I agree that a restriction by
TXNAME
should not return other values in theTXNAME
column.To expand, just use the
expand()
function.With regard to (the inefficient) Approach 1, I am surprised that Gviz cannot plot a GRangesList directly.
Thank you! I was not aware of the expand() function.
I still run into problems with Gviz. It's not GRangesList I'm trying to plot, it's GRanges. Do you see what I'm doing wrong?
It looks like maybe you need to pass
trTrack
toGviz::plotTracks()
instead oftst2
.Truly embarrassing. Thank you!