Converting UniProt Identifier Into Representative Gene Symbol
1
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 7 hours ago
Australia

In UniProt, an entry such as Q8N4C6 has Gene: NIN but in the Gene Names field

Name: NIN
Synonyms:KIAA1565

If I select the genes column, I get both names.

> select(up, "Q8N4C6", "GENES", "UNIPROTKB")
Getting extra data for Q8N4C6
'select()' returned 1:1 mapping between keys and columns
  UNIPROTKB        GENES
1    Q8N4C6 NIN KIAA1565

How can I get just the representative gene symbol (i.e. Approved Symbol by HGNC) like is shown at the top of the summary web page for Q8N4C6? I'm uncertain if the official symbol is always reported first in the genes column. I suppose that I'm hoping there was a select(up, "Q8N4C6", "GENE", "UNIPROTKB") command available.

UniProt.ws • 8.9k views
ADD COMMENT
3
Entering edit mode

Right now UniProt.ws is querying the REST API using the 'genes' keyword, so the query you end up passing is

https://www.uniprot.org/uniprot/?query=Q8N4C6&format=tab&columns=id,genes

Which will return all HGNC symbols for that protein. Since KIAA1565 got wrapped into NIN back in 2005, you get both, because technically both gene symbols apply. There is however a preferred symbol, which you can get using

https://www.uniprot.org/uniprot/?query=Q8N4C6&format=tab&columns=id,genes(PREFERRED)

Hypothetically we could add this to UniProt.ws, but there are other ways of doing things that are even easier:

> select(org.Hs.eg.db, "Q8N4C6", "SYMBOL","UNIPROT")
'select()' returned 1:1 mapping between keys and columns
  UNIPROT SYMBOL
1  Q8N4C6    NIN

Which, all things equal, should be the go-to solution.

 

ADD REPLY
2
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 1 hour ago
EMBL Heidelberg

Not a solution using uniprot.ws but if you query directly using the REST API you get a single gene symbol returned, which is presumably always the same as the one that appears on the summary pages.

library(httr)
my_protein_ids <- c('Q8N4C6', 'Q9UM73')

results <- POST(url = "https://www.uniprot.org/uploadlists/",
                body = list(from = 'ID',
                            to = 'GENENAME',
                            format = 'tab',
                            query = paste(my_protein_ids, collapse = ' ')))

uniprot_results <- content(results, type = 'text/tab-separated-values', 
                           col_names = TRUE, 
                           col_types = NULL, 
                           encoding = "UTF-8")
> uniprot_results
# A tibble: 2 x 2
  From   To   
  <chr>  <chr>
1 Q8N4C6 NIN  
2 Q9UM73 ALK
ADD COMMENT
0
Entering edit mode

Is it possible to apply this code to a column of Swissprot IDs (in an excel file)? As this results from proteomics analysis, some proteins are ambiguous.

I tried this, but get a wrong output below:

library(httr)
data <- read_xlsx("reference_set.xlsx", col_names = TRUE)
my_protein_ids <- data$proteins

results <- POST(url = "https://www.uniprot.org/uploadlists/",
                body = list(from = 'ID',
                            to = 'GENENAME',
                            format = 'tab',
                            query = paste(my_protein_ids, collapse = ' ')))

uniprot_results <- content(results, type = 'text/tab-separated-values', 
                           col_names = TRUE, 
                           col_types = NULL, 
                           encoding = "UTF-8")

output:

<html> 1 <head><title>405 Not Allowed</title></head> 2 <body> 3 <center>

405 Not Allowed

</center> 4
<center>nginx/1.21.6</center> 5 </body> 6 </html>

Here is a sample of the IDs:

Q9BVI4
Q12802
Y-FGCZCont00180
Y-FGCZCont00092
Q8TEJ3
Q5SXM2
P07478;Q8NHM4
Y-FGCZCont00261
P04264
P51532
P02452
P05997
A0A096LP49
Q8NFW1
Q8N2M8
Q96GP6
P02461
P35527
P13645
Y-FGCZCont00416
P35908;Y-FGCZCont00285
Y-FGCZCont00406
P02538;Y-FGCZCont00035
Y-FGCZCont00062
Q9Y666
Q92688
Y-FGCZCont00402
Q2M2I5;Y-FGCZCont00425
Q8WUA4
Q14031
Q8IWN7
P08123
Q5VST9
ADD REPLY
1
Entering edit mode

I think Jim's answer above using select still applies but it is also possible with UniProt.ws.

Note that this only works for valid UniProt accession identifiers from your list. This means that IDs such as Y-FGCZCont00180 may not work.

library(UniProt.ws)
mapUniProt("UniProtKB_AC-ID", "UniProtKB", query = c("Q9BVI4", "Q12802"))
ADD REPLY

Login before adding your answer.

Traffic: 1030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6