Question: Getting a list of all tRNA encoding genes for an organism?
0
4.1 years ago by
Keith Hughitt120
United States
Keith Hughitt120 wrote:

Hello,

Does anyone know of a way to get a list of ENSEMBL gene identifiers for all known tRNA encoding genes for a given species?

At the moment, I am interested in generating such lists for human and mouse.

For other types of genes (rRNAs, snoRNAs, etc.) I am able to use biomaRt to find all such genes, for example:

library(biomaRt)
ensembl_mart = useMart(biomart="ensembl")
biomart = useDataset('hsapiens_gene_ensembl', mart=ensembl_mart)

biomart_genes = getBM(attributes=c("ensembl_gene_id", "gene_biotype"), mart=biomart)
my_genes$type = biomart_genes$gene_biotype[match(my_genes$ENSEMBL, biomart_genes$ensembl_gene_id)]

...where "my_genes" is some dataframe of genes (e.g. a count table) with an id field called "ENSEMBL"

tRNAs do not have their own gene_biotype grouping, and therefor a separate approach is needed to find them.

I stumbled across Marc's FDB.UCSC.tRNAs database package and was able to figure out how to at least get list of tRNA genes:

library('FDb.UCSC.tRNAs')
names(features(FDb.Hsapiens.UCSC.hg19.tRNAs))

However, I'm not sure how to map from these entries back to my ENSEMBL gene ids.

Anyone know of a better way?

Keith

modified 4.1 years ago by Hervé Pagès ♦♦ 13k • written 4.1 years ago by Keith Hughitt120

Hi Keith

I was in a similar situation a few months ago, when I was working with the ensembl gtf files (for mouse and human), which only contain the mitochondrial tRNA genes. I approached the ensembl helpdesk, and was adviced to use the Perl API to get them.

Hans-Rudolf

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Hotz, Hans-Rudolf400

Hi Hans-Rudolf,

Thanks for the suggestion -- I will check out the Perl API.

Did you have any luck using it to query tRNAs?

yes, it worked very well for my needs. The trick was:

$slice->get_all_SimpleFeatures('tRNAscan') } Hans-Rudolf ADD REPLYlink written 4.1 years ago by Hotz, Hans-Rudolf400 Answer: Getting a list of all tRNA encoding genes for an organism? 1 4.1 years ago by Johannes Rainer1.4k Italy Johannes Rainer1.4k wrote: Hi Keith! You could try to use the Ensembl annotation packages (e.g. EnsDb.Hsapiens.v75) that I have submitted to Bioconductor and are currently in Bioc-devel (should be available in Bioconductor's next release). Basically, I'm using the perl Ensembl API to get the gene/transcript models defined in Ensembl and store that along some additional information (i.e. gene name, gene biotype, transcript biotype) in an SQLite database included in the above mentioned annotation packages. To get a list of all Ensembl gene ids: library(EnsDb.Hsapiens.v79) ## you could use the listGenebiotypes to get an overview of all available ## gene biotypes in Ensembl listGenebiotypes(EnsDb.Hsapiens.v79) ## get all genes with gene biotype "Mt_tRNA"... seems to be the only ## tRNA biotype in Ensembl genes(EnsDb.Hsapiens.v79, filter=list(GenebiotypeFilter("Mt_tRNA"))) as I mentioned... the package is still in Bioc-devel, so you'll either have to install the current devel version or wait for the next Bioconductor release (which will be on April 17th). Also, I'm not sure if ALL tRNA genes can be fetched like this, as there seems to be only the gene biotype "Mt_tRNA" defined in Ensembl, so it will only return mitochondrial tRNAs. cheers, jo ADD COMMENTlink written 4.1 years ago by Johannes Rainer1.4k Ah, sorry. Apparently I didn't read the full message first... so it seems my option above would be only something like an alternative to the biomart approach. To map the features from FDb.UCSC.tRNAs you could try their start, end and seqnames to query biomart if there is a gene matching these. I tried that using the EnsDb.Hsapiens.v75 package but get only a match for 3 out of the 625. I checked some tRNAs manually in the Ensembl web page and apparently they map to introns of (protein coding) genes. The code I used (might also be possible to do that with biomart) library('FDb.UCSC.tRNAs') library(EnsDb.Hsapiens.v75) ensdb <- EnsDb.Hsapiens.v75 tRNAs <- features(FDb.Hsapiens.UCSC.hg19.tRNAs) Ensgenes <- character(length(tRNAs)) for(i in 1:length(tRNAs)){ Gene <- genes(ensdb, filter=list( SeqstartFilter(start(tRNAs)[i], condition=">=", feature="gene"), SeqendFilter(end(tRNAs)[i], condition="<=", feature="gene"), SeqnameFilter(sub(as.character(seqnames(tRNAs)[i]), pattern="chr", replacement="")), SeqstrandFilter(as.character(strand(tRNAs[i]))) )) if(length(Gene)>0){ Ensgenes[i] <- paste(unique(Gene$gene_id), collapse=";")
}
}
sum(Ensgenes!="")

Thanks for the suggestion, Johannes!

That is strange about the tRNAs mapping to introns. Did you use ENSEMBL version 75  or 79? In your code above I see both versions. If you used 75, perhaps the coordinates differ since the UCSC is currently on GRCh38 (~> E79)? I'm not sure if the change should be so dramatic though. More likely it is just a lack of understanding of the UCSC table annotations.

Also, I tried installing the database package you put together in R-devel (Bioconductor version 3.1, BiocInstaller 1.17.6, R version 3.3.0), but the 'ensembldb' dependency could not be found. Any suggestions?

I tried now also with the 79 version, but don't find anything there. Actually, the hg19 corresponds to the GRCh37, so, Ensembl 75 was OK. I rather believe that Ensembl does not have the tRNAs defined as "genes".

Regarding the ensembldb package, yes you're right :) there is a problem that the dependency is not (yet) available, but I hope Marc is fixing that soon.

Hi Johannes, Your assumption is correct; we don't annotate tRNAs at genes in Ensembl They are stored in the simple_feature table in the Core MySQL database. As Hans pointed out earlier, they can be accessed via our API. Hope this helps. Cheers, Amonida -- Amonida Zadissa Ensembl Production On 01/04/2015 12:43, johannes.rainer [bioc] wrote: > johannes.rainer posted the Comment: "Getting a list of all tRNA > encoding genes for an organism?": > > I tried now also with the 79 version, but don't find anything there. > Actually, the hg19 corresponds to the GRCh37, so, Ensembl 75 was OK. > I rather believe that Ensembl does not have the tRNAs defined as > "genes". Regarding the ensembldb package, yes you're right :) there > is a problem that the dependency is not (yet) available, but I hope > Marc is fixing that soon. > > --- > See the full post at: C: Getting a list of all tRNA encoding genes for an organism? > Replying to this email will post a comment to the answer above. >
Answer: Getting a list of all tRNA encoding genes for an organism?
1
4.1 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

The coordinates can be retrieved via SQL queries, provided one has a mysql client installed.

library(dplyr)
library(GenomicRanges)

Connect to the data base

db <- src_mysql("homo_sapiens_core_79_38",
"useastdb.ensembl.org", 3306, "anonymous")

Figure out the analysis id corresponding to the tRNA scan

analysis_id <- tbl(db, "analysis_description") %>%
filter(display_label=="tRNAs")

Select the simple features corresponding to this analysis

features <- semi_join(tbl(db, "simple_feature"), analysis_id,
by="analysis_id")

Get the chromosome name

seq_name <- tbl(db, "seq_region") %>% filter(seq_region_id, name)
features <- inner_join(features, seq_name)

Make into a GRanges

features %>% makeGRangesFromDataFrame(
keep.extra.columns=TRUE, seqnames.field="name",
start.field="seq_region_start", end.field="seq_region_end",
strand.field="seq_region_strand")

dplyr and the published schema made this relatively easy to explore. I think it doesn't get to the original question, which is to annotate these with ENS identifiers, if these actually exist. The SQL server seemed to be quite flaky, with frequent time-outs and erratic performance.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Martin Morgan ♦♦ 23k
Answer: Getting a list of all tRNA encoding genes for an organism?
1
4.1 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

FWIW it doesn't seem that Ensembl uses a consistent approach to tag tRNAs. I guess it depends on the organism. For some organisms (e.g. Fly) it looks like the information is available via the transcript_biotype BioMart attribute. Starting with BioC 3.1, makeTxDbFromBiomart() imports that attribute in the tx_type column. Note that this is a new TxDb column that you can then extract from the TxDb object using the columns arg of the transcripts() extractor.

See for example:

A: Does BSgenome.Dmelanogaster.UCSC.dm2 mountain non-coding RNAs?

for how to use makeTxDbFromBiomart() on Fly and get the tx_type for each transcript (314 tRNAs for Fly).

Unfortunately, as Johannes noticed previously, things don't work so well for Human where only mitochondrial tRNAs (Mt_tRNA) seem to be tagged:

library(GenomicFeatures)
txdb <- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")
tx <- transcripts(txdb, columns=c("tx_name", "gene_id", "tx_type"))
grep("tRNA", unique(mcols(tx)\$tx_type), ignore.case=TRUE, value=TRUE)
# [1] "vaultRNA" "Mt_tRNA"


See:

for more details about the new tx_type feature.

H.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Hervé Pagès ♦♦ 13k