Consider this code:
library(microbenchmark)
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
microbenchmark::microbenchmark(
# Option 1:
GenomicFeatures::exons(
txdb,
vals = list(tx_name = c("uc001aaa.3", "uc010nxq.1")),
columns = list("EXONNAME", "TXNAME", "GENEID")),
# Option 2:
GenomicFeatures::exonsBy(txdb, by = "tx", use.names = TRUE)[c("uc001aaa.3", "uc010nxq.1")],
times = 10
)
# Option 1 takes an average of 1.6 seconds
# Option 2 takes an average of 6.5 seconds
These two options gets me the same data, but option1 is a lot faster. Is there an easy way I can use option1 and still get the structure that is option2? Or is there a way to make exonsBy faster by only extracting the transcripts I'm interested in?
My end goal is to get a GRangesList object with one GRanges object per transcript, OR a single GRanges object with duplicate entries for exons that appear in multiple transcripts (if that's possible). To begin with I only have the transcript names and the TxDb object.

Thanks for answering. I started working on an example to help explain what I want to accomplish:
library(Gviz) library(GenomicFeatures) options(ucscChromosomeNames=FALSE) txdbhg19 <- GenomicFeatures::makeTxDbFromBiomart( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "dec2013.archive.ensembl.org") transcriptNames <- c("ENST00000370032", "ENST00000420055") # ----- APPROACH 1 # Get GrangesList tst <- exonsBy(txdbhg19, by = "tx", use.names=TRUE)[transcriptNames] # Unlist to get GRanges tst <- unlist(tst) # Add transcript metadata column to make Gviz group correctly elementMetadata(tst)$transcript <- names(tst) # Create track and plot trTrack <- Gviz::GeneRegionTrack(tst) Gviz::plotTracks(trTrack) # This is the plot I want to produce # ----- APPROACH 2 tst2 <- GenomicFeatures::exons( txdbhg19, vals = list(tx_name = transcriptNames), columns = list("EXONNAME", "TXNAME", "GENEID")) # Problem 1: The TXNAME metadata field now contains transcripts that are not in my original transcript list: unique(unlist(elementMetadata(tst2)$TXNAME[!elementMetadata(tst2)$TXNAME %in% transcriptNames])) # [1] "ENST00000370031" "ENST00000402983" # I can hack this out, but it feels unclean to do this: elementMetadata(tst2)$TXNAME <- elementMetadata(tst2)$TXNAME[elementMetadata(tst2)$TXNAME %in% transcriptNames] # I would much rather have it so that I only get the transcript names that I originally wanted. Anyhow, let's continue.. # Each entry in elementMetadata(tst2)$TXNAME now has 1 or more transcript names in it. This is not what I want. I want to somehow "expand" this GRanges object, so that each row only have one TXNAME value. Here's a very naive way to do it: gr <- GRanges() for (i in 1:length(tst2)) { trNames <- elementMetadata(tst2[i])$TXNAME[[1]] trNames <- strsplit(trNames, " ") for (j in 1:length(trNames)) { newGr <- tst2[i] elementMetadata(newGr)$TXNAME <- trNames[[j]] gr <- append(gr, newGr) print(paste(elementMetadata(tst2[i])$GENEID, trNames[[j]])) } } elementMetadata(gr)$transcript <- elementMetadata(gr)$TXNAME trTrack <- Gviz::GeneRegionTrack(gr) Gviz::plotTracks(gr) # Error in (function (classes, fdef, mtable) : # unable to find an inherited method for function ‘displayPars<-’ for signature ‘"GRanges", "list"’ # Why do I get this error? I was hoping that this gr structure would be similar to the tst structure in approach 1The goal again is to get a structure similar to the one I get with exonsBy() -> unlist(). Any ideas?
I agree that the TxDb accessors have major usability issues. Probably my least favorite thing in Bioconductor. I agree that a restriction by
TXNAMEshould not return other values in theTXNAMEcolumn.To expand, just use the
expand()function.With regard to (the inefficient) Approach 1, I am surprised that Gviz cannot plot a GRangesList directly.
Thank you! I was not aware of the expand() function.
I still run into problems with Gviz. It's not GRangesList I'm trying to plot, it's GRanges. Do you see what I'm doing wrong?
It looks like maybe you need to pass
trTracktoGviz::plotTracks()instead oftst2.Truly embarrassing. Thank you!