transcript name mapping between human and mouse
1
2
Entering edit mode
Eric ▴ 20
@5e958880
Last seen 21 months ago
Hong Kong

Hello, forum,

I want to map the mouse transcript id like ENSMUST00000159265.1 to the human transcript id which starts with "ENST" using the biomaRt, but I failed. What is wrong with my code, or is the idea unrealistic?

My code is as follows:


library("biomaRt")
human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org/") 
mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org/")
# View(listAttributes(mouse))
# View(listAttributes(human))
getLDS(attributes = c("mgi_trans_name"), filters = "mgi_trans_name", 
       values = c("ENSMUST00000159265.1") , mart = mouse, 
       attributesL = c("hgnc_trans_name"), martL = human, 
       uniqueRows=T)
getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", 
       values = c("Xkr4") , mart = mouse, 
       attributesL = c("hgnc_symbol"), martL = human, 
       uniqueRows=T)
getLDS(attributes = c("mgi_id"), filters = "mgi_id", 
       values = c("ENSMUST00000159265.1") , mart = mouse, 
       attributesL = c("hgnc_id"), martL = human, 
       uniqueRows=T)
biomaRt getLDS • 2.4k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

That's an Ensembl transcript ID, not MGI. And you have the wrong version. Here's how you can figure that out.

## just do a simple lookup first, without specifying the version
> getBM("ensembl_transcript_id_version", "ensembl_transcript_id", "ENSMUST00000159265", mouse)
  ensembl_transcript_id_version
1          ENSMUST00000159265.2

## now do the lookup using the correct attributes and filters, with the correct version for that archived database
> getLDS("ensembl_transcript_id_version", "ensembl_transcript_id_version", "ENSMUST00000159265.2", mouse, "ensembl_transcript_id_version", martL = human)
  Transcript.stable.ID.version Transcript.stable.ID.version.1
1         ENSMUST00000159265.2              ENST00000327381.7
2         ENSMUST00000159265.2              ENST00000518261.1
3         ENSMUST00000159265.2              ENST00000622811.1

## or you could be version agnostic, so you don't have to make sure you have the right archive version

> getLDS("ensembl_transcript_id_version", "ensembl_transcript_id", "ENSMUST00000159265", mouse, "ensembl_transcript_id_version", martL = human)
  Transcript.stable.ID.version Transcript.stable.ID.version.1
1         ENSMUST00000159265.2              ENST00000327381.7
2         ENSMUST00000159265.2              ENST00000518261.1
3         ENSMUST00000159265.2              ENST00000622811.1
ADD COMMENT
0
Entering edit mode

It works very well, and thanks for such a great answer! Because I use the old version gtf from GENECODE (vM12), the id version does not match the version stored in Ensembl. So I use the transcript id (without a version) for conversion finally.

ADD REPLY
0
Entering edit mode

It might be difficult to do the mapping using version numbers. Gencode m12 is based on Ensembl release 87, which is from 2016. There are archived versions of Biomart that you can query, but they are not that fine grained, so your choices would be Ensembl 80 or 91. That said, the mapping between species shouldn't depend that much on the version numbers anyway.

ADD REPLY

Login before adding your answer.

Traffic: 623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6