Getting HGNC gene names from Ensembl transcript IDs (e.g., ENST0000...)
1
0
Entering edit mode
kmuench ▴ 40
@kmuench-9243
Last seen 2.9 years ago
United States

Hello,

In R, I previously used this piece of code to look up Ensembl IDs for lists of genes beginning with ENSG000... .  In this example, my_df is a dataframe where the rownames are the gene IDs 9e.g. ENSG...):

 

  my_df$ensembl <- sapply( strsplit( rownames(my_df), split="\\+" ), "[", 1 )
  ensembl = useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", host="www.ensembl.org") # reflects recent change to hosting, as discussed in https://support.bioconductor.org/p/74322/
  genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"),
                    filters = "ensembl_gene_id",
                    values = my_df$ensembl,
                    mart = ensembl )
  idx <- match( my_df$ensembl, genemap$ensembl_gene_id )
  my_df$entrez <- genemap$entrezgene[ idx ]
  my_df$hgnc_symbol <- genemap$hgnc_symbol[ idx ]

 

I'd now like to use this on a dataframe where the input row names are transcript IDs (e.g. ENST000...). I'm not sure whether I can do this with BioMart - does anyone know?

rnaseq r biomart • 6.0k views
ADD COMMENT
0
Entering edit mode

The Ensembl mart also provides transcript IDs (via the ensembl_transcript_id attribute) so I don't see why you couldn't do the same with transcript IDs instead of gene IDs. Use listAttributes() to list all the attributes available for your dataset.

H.

ADD REPLY
1
Entering edit mode
Johannes Rainer ★ 1.9k
@johannes-rainer-6987
Last seen 4 months ago
Italy

you could actually use stuff from the ensembldb package to get the mapping between transcript ids and gene names (HGNC):

library(EnsDb.Hsapiens.v75)

edb <- EnsDb.Hsapiens.v75

## Get all transcripts defined in Ensembl (version 75):

tx <- transcripts(edb, columns=c("tx_id", "gene_id", "gene_name"))

## you can then extract the transcript ids and gene names or even

mapping <- cbind(tx_id=tx$tx_id, name=tx$gene_name)

rownames(mapping) <- mapping[, 1]

head(mapping)

> head(mapping)
                tx_id             name    
ENST00000373020 "ENST00000373020" "TSPAN6"
ENST00000494424 "ENST00000494424" "TSPAN6"
ENST00000496771 "ENST00000496771" "TSPAN6"
ENST00000373031 "ENST00000373031" "TNMD"  
ENST00000485971 "ENST00000485971" "TNMD"  
ENST00000371582 "ENST00000371582" "DPM1"  

 

hope that helps

cheers, jo

ADD COMMENT
0
Entering edit mode

Thank you for the suggestion! I'm having trouble with the install (getting a "there is no package called ‘EnsDb.Hsapiens.v75’" message) but this looks like what I want - I'll keep trying.

ADD REPLY
0
Entering edit mode

For posterity: my issue is specifically that the library doesn't seem to be available yet for my version of R (error: "package ‘ensembldb’ is not available (for R version 3.1.2)"). Currently working through these possibilities: http://stackoverflow.com/questions/25721884/how-should-i-deal-with-package-xxx-is-not-available-for-r-version-x-y-z-wa

ADD REPLY
0
Entering edit mode

Check out the ensembldb landing page. In the gray-and-white striping 'Details' section it says the package has been in Bioconductor since BioC 3.1 (R-3.2). To use the package, simply install the current version of R (R-3.2.2) and following the usual source() / biocLite() instructions on the landing page. The package will never be made available for an older version of R than the version it was introduced in, so the 'yet' in your comment is too optimistic!

ADD REPLY
0
Entering edit mode

Haha! Thank you for pointing that out!

ADD REPLY

Login before adding your answer.

Traffic: 341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6