Accession ID Conversion and Missing Results
1
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 7 days ago
Australia

Rarely, UniProt.ws select seems to return missing values, although there should be a result. Consider

library(UniProt.ws)
database <- UniProt.ws(taxId = 9606)
select(x = database, keys = c("P01613", "P01861"), columns = "GENES", keytype = "UNIPROTKB")
Getting extra data for P01861
'select()' returned 1:1 mapping between keys and columns
  UNIPROTKB GENES
1    P01613  <NA>
2    P01861 IGHG4

If you check the UniProt website both P01613 and P01861 have a gene symbol. Why do I get NA for P01613?

UniProt.ws • 138 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

The answer is at the top of the first page you showed. Note that what you got isn't P01613. It's P01593. When you query UniProt directly with a deprecated UniProt KB ID, it silently converts to the new one and presents the page.

Internally, the UniProt.ws package first gets all the available keys and then removes those in your query that don't match up with those available for the species you are interested in. This is in some sense necessary as you can have problems if you provide UniProt KB IDs that aren't the species you are asking about. What happens under the hood is a URI is generated and sent to UniProt. What you are sending right now is

https://www.uniprot.org/mapping/?from=ACC%2BID&to=GENENAME&format=tab&query=P01861

after having the second ID stripped off because it's not current. Ideally you would send both, because UniProt is happy to return what you are asking, and is even nice enough to not map to the new KB ID.

https://www.uniprot.org/mapping/?from=ACC%2BID&to=GENENAME&format=tab&query=P01861,P01613

Which if you paste into a browser you will see returns what you expect. However, if you add a mouse ID as well

https://www.uniprot.org/mapping/?from=ACC%2BID&to=GENENAME&format=tab&query=P01861,P01613,Q8K3W0

You still get all the results, only now there is a mouse symbol that's infiltrated your results.

0
Entering edit mode

So, is there a way to convert the IDs from such legacy data sets automatically? Is there backwards compatibility built into the R package?

ADD REPLY
0
Entering edit mode

If there were a simple way around this I would have told you rather than explain why it doesn't do what you expect. I mean, what's the profit in telling you why it doesn't do what you want if I can just say 'do it this way'?

You are free to fork the UniProt.ws package and then modify lines 116-120 to ignore any keys that aren't current (or from the species you are querying on) and then it will be 'backward compatible'.

ADD REPLY

Login before adding your answer.

Traffic: 224 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6