Conversion of UniProt to Ensembl
2
2
Entering edit mode
Recep ▴ 20
@7c8fd686
Last seen 2.1 years ago
Germany

Hello, im new in bioinformatics and trying to convert uniprot ids to ensemble. i could install biomaRt and read the datas by using read.csv.

Now i need to convert the ids. But how can i do that? Could you share any documentation i couldnt find anything about that.

Thank you.

biomaRt ensembldb UniProt.ws • 5.8k views
ADD COMMENT
1
Entering edit mode

Note that this can also be done via the UniProt.ws package:

library(UniProt.ws)
up <- UniProt.ws()
univals <- head(keys(up, "UniProtKB"))
select(up, keys = univals, columns = c("accession", "id", "xref_ensembl"), keytype = "UniProtKB")
        From      Entry  Entry.Name                                                     Ensembl
1 A0A0C5B5G6 A0A0C5B5G6 MOTSC_HUMAN                                                        <NA>
2 A0A1B0GTW7 A0A1B0GTW7 CIROP_HUMAN                                          ENST00000637218.2;
3     A0JNW5     A0JNW5 UH1BL_HUMAN ENST00000279907.12 [A0JNW5-1];ENST00000356828.7 [A0JNW5-2];
4     A0JP26     A0JP26 POTB3_HUMAN  ENST00000611217.5 [A0JP26-1];ENST00000612601.2 [A0JP26-2];
5     A0PK11     A0PK11 CLRN2_HUMAN                                          ENST00000511148.2;
6     A1A4S6     A1A4S6 RHG10_HUMAN                                          ENST00000336498.8;

Though the IDs appear different...

ADD REPLY
3
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

I will provide an example below. But first, there is a vignette for biomaRt (actually two), and it has extensive examples of how to use the package. When you are trying to figure out how to use a package, reading the vignette (possible multiple times) should be your first step. What you want to do could easily be extrapolated from the information there, and if you want to learn how to use R/Bioconductor, reading examples and inferring how to do what you want from those examples is an invaluable skill.

Another skill is asking good questions. You don't say what species, so I am going to assume human. When asking a question, it's best to try to think of what relevant information somebody else might need to answer your question, and try to provide that.

## instanciate a mart object
> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
## get some UniProt IDs
> library(UniProt.ws)
> univals <- head(keys(z, "UniProtKB"))
> univals
[1] "A0A0C5B5G6" "A0A1B0GTW7" "A0JNW5"     "A0JP26"     "A0PK11"    
[6] "A1A4S6"  
> getBM(c("ensembl_gene_id","uniprot_gn_id"), "uniprot_gn_id", univals, mart)
  ensembl_gene_id uniprot_gn_id
1 ENSG00000283654    A0A1B0GTW7
2 ENSG00000111647        A0JNW5
3 ENSG00000278699        A0JP26
4 ENSG00000278522        A0JP26
5 ENSG00000249581        A0PK11
6 ENSG00000071205        A1A4S6

It's important to remember to ask getBM to return your input IDs, so you know which Ensembl gene ID matches up to a given UniProt ID.

ADD COMMENT
0
Entering edit mode

I noticed that the previous query is returning Ensembl transcript IDs rather than Gene IDs. I have updated UniProt.ws to be able to directly translate IDs:

suppressPackageStartupMessages({
    library(UniProt.ws)
})
packageVersion("UniProt.ws")
#> [1] '2.41.6'
up <- UniProt.ws()
univals <- c("A0A0C5B5G6", "A0A1B0GTW7", "A0JNW5", "A0JP26", "A0PK11")
select(up, keys = univals, to = "Ensembl")
#> Warning: IDs not mapped: A0A0C5B5G6
#>         From                 To
#> 1 A0A1B0GTW7  ENSG00000283654.3
#> 2     A0JNW5 ENSG00000111647.13
#> 3     A0JP26  ENSG00000278522.5
#> 4     A0PK11  ENSG00000249581.2

Created on 2023-09-14 with [reprex v2.0.2](https://reprex.tidyverse.org)

ADD REPLY
0
Entering edit mode

Hi, thanks this is really helpful. I have downloaded the UniProt human ID mapping file, but not sure if this is the same as UniProtKB or how to link it? Any help is much appreciated. Thanks

ADD REPLY
0
Entering edit mode

Did you mean a mapping file that looks like the example from the UniProt website?

I am not sure how you would use it but the data in the file should already be accessible via the UniProt API.

The above example maps accession IDs to Ensembl IDs (UniProtKB_AC-ID -> Ensembl). The accession IDs are not the same as UniProtKB although for some IDs they are the same.

For example, some identifiers return different UniProtKB IDs:

> select(up, keys = "001R_FRG3G", to = "UniProtKB", columns = "accession")
#'        From  Entry
#' 1 001R_FRG3G Q6GZX4
ADD REPLY
1
Entering edit mode
Eva ▴ 10
@ae923a5a
Last seen 17 days ago
Spain

Although it has passed some months, I found useful to post a new way to get uniprot ids using biomart. (Just in case someone needs it or have any problems with the previous answers).

library(biomaRt)
mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl") # mouse
#mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") # human

Uniprot = getBM(
  attributes=c('ensembl_gene_id','uniprotswissprot'), 
  mart = mart)

colnames(Uniprot) <- c("Ensembl_ID", "UniProt" )

Note: biomaRt has two attributes related to uniprot: uniprotswissprot and uniprot_gn_id. I have used the first one because is the UniProtKB review (Swiss-Prot).

ADD COMMENT

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6