Best way to convert uniprot accessions to entrez gene identifiers in R
1
4
Entering edit mode
Last seen 14 months ago
Germany

What is the best way to convert uniprot accessions to entrez gene identifiers?

What is the best way to reverse the map org.Hs.eg.db::org.Hs.egUNIPROT ?

Is there any better approach (a pity there is no 'org.Hs.uniprot.db' package)?

0
Entering edit mode

Just discovered revmap()

0
Entering edit mode

revmap() is part of the old BiMap interface. You will be better served using select().

7
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

You could either use the UniProt.ws package, or as you note, you could use the org.Hs.eg.db package. But you don't have to reverse any maps, as the BiMap interface is simply an artifact of a bygone era. These days the cool kids use select().

> uniprots <- Rkeys(org.Hs.egUNIPROT)[1:5]
> select(org.Hs.eg.db, uniprots, "ENTREZID", "UNIPROT")
UNIPROT ENTREZID
1  P04217        1
2  V9HWD8        1
3  P01023        2
4  P18440        9
5  Q400J6        9

OR UniProt

> library(UniProt.ws)
> up <- UniProt.ws(taxId=9606)
> select(up, uniprots, "ENTREZ_GENE")
Getting mapping data for P04217 ... and P_ENTREZGENEID
UNIPROTKB ENTREZ_GENE
1    P04217           1
2    V9HWD8           1
3    P01023           2
4    P18440           9
5    Q400J6           9
>

0
Entering edit mode

Thanks James! Love your 'cool kids' motivation to switch to the new interface :-). Will definitely do!

0
Entering edit mode

Small additional question: should I use

import org.Hs.eg.db
importFrom AnnotationDbi select

or

importFrom org.Hs.eg.db org.Hs.eg.db
importFrom AnnotationDbi select

0
Entering edit mode

I assume this is a package you are developing, and you are asking about your NAMESPACE file?

0
Entering edit mode

Yep, my package has a dependency on the functionality we have been discussing here

1
Entering edit mode

The org.Hs.eg.db package is just a wrapper to allow easy interrogation of an underlying SQLite database. So if you need that package specifically, then I would just put it in your Depends field.

You should note that select() will return duplicates for any one-to-many mappings. So as an example, say you have a UniProt ID that maps to two Entrez Gene IDs (this may or may not occur - I haven't checked). In that situation you will return a data.frame like

UNIPROT    ENTREZID
P12345       23434
P12345       321234

And if you are naive about things, and expect just one Entrez ID to be returned, then you will have problems. If you are just mapping from one ID to another, you can use mapIds(), with multiVals = "first". Or something different, depending on how you want to do things. But that is an easy way to control for one-to-many mappings.

And back to the question at hand, if you are only using select() or mapIds(), then you can just importFrom, rather than importing the whole namespace.

0
Entering edit mode

Thank you so much James!