Question: Converting UniProt Identifier Into Representative Gene Symbol
0
gravatar for Dario Strbenac
12 months ago by
Dario Strbenac1.5k
Australia
Dario Strbenac1.5k wrote:

In UniProt, an entry such as Q8N4C6 has Gene: NIN but in the Gene Names field

Name: NIN
Synonyms:KIAA1565

If I select the genes column, I get both names.

> select(up, "Q8N4C6", "GENES", "UNIPROTKB")
Getting extra data for Q8N4C6
'select()' returned 1:1 mapping between keys and columns
  UNIPROTKB        GENES
1    Q8N4C6 NIN KIAA1565

How can I get just the representative gene symbol (i.e. Approved Symbol by HGNC) like is shown at the top of the summary web page for Q8N4C6? I'm uncertain if the official symbol is always reported first in the genes column. I suppose that I'm hoping there was a select(up, "Q8N4C6", "GENE", "UNIPROTKB") command available.

uniprot.ws • 361 views
ADD COMMENTlink modified 12 months ago by Mike Smith3.9k • written 12 months ago by Dario Strbenac1.5k
3

Right now UniProt.ws is querying the REST API using the 'genes' keyword, so the query you end up passing is

https://www.uniprot.org/uniprot/?query=Q8N4C6&format=tab&columns=id,genes

Which will return all HGNC symbols for that protein. Since KIAA1565 got wrapped into NIN back in 2005, you get both, because technically both gene symbols apply. There is however a preferred symbol, which you can get using

https://www.uniprot.org/uniprot/?query=Q8N4C6&format=tab&columns=id,genes(PREFERRED)

Hypothetically we could add this to UniProt.ws, but there are other ways of doing things that are even easier:

> select(org.Hs.eg.db, "Q8N4C6", "SYMBOL","UNIPROT")
'select()' returned 1:1 mapping between keys and columns
  UNIPROT SYMBOL
1  Q8N4C6    NIN

Which, all things equal, should be the go-to solution.

 

ADD REPLYlink written 12 months ago by James W. MacDonald51k
Answer: Converting UniProt Identifier Into Representative Gene Symbol
2
gravatar for Mike Smith
12 months ago by
Mike Smith3.9k
EMBL Heidelberg / de.NBI
Mike Smith3.9k wrote:

Not a solution using uniprot.ws but if you query directly using the REST API you get a single gene symbol returned, which is presumably always the same as the one that appears on the summary pages.

library(httr)
my_protein_ids <- c('Q8N4C6', 'Q9UM73')

results <- POST(url = "https://www.uniprot.org/uploadlists/",
                body = list(from = 'ID',
                            to = 'GENENAME',
                            format = 'tab',
                            query = paste(my_protein_ids, collapse = ' ')))

uniprot_results <- content(results, type = 'text/tab-separated-values', 
                           col_names = TRUE, 
                           col_types = NULL, 
                           encoding = "UTF-8")
> uniprot_results
# A tibble: 2 x 2
  From   To   
  <chr>  <chr>
1 Q8N4C6 NIN  
2 Q9UM73 ALK
ADD COMMENTlink modified 12 months ago • written 12 months ago by Mike Smith3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 151 users visited in the last hour