Search
Question: Problem with mapp ensemble protein ids to entrez ids using org.Hs.eg.db
0
gravatar for Marcus Aurelius
9 months ago by
Marcus Aurelius0 wrote:

I am using org.Hs.eg.db to map ensemble ids (from Stringdb) to entrez ids. I used to be able to map almost all of the ensemble ids in Stringdb, but for some reason now I can't. For example

> unlist(mget(c("ENSP00000000233"), map, ifnotfound=NA))
ENSP00000000233 
             NA 

And when I go to map the ensemble on ensemble.org (http://www.ensembl.org/Homo_sapiens/Search/Results?q=ENSP00000000233;site=ensembl;facet_species=Human), I get that the ensemble maps to ARF5. Is there a fix for this or is there a new bug somewhere in the new version of org.Hs.eg.db?

ADD COMMENTlink modified 9 months ago by James W. MacDonald45k • written 9 months ago by Marcus Aurelius0
0
gravatar for James W. MacDonald
9 months ago by
United States
James W. MacDonald45k wrote:

If you do the corresponding query in the Biomart interface at ensembl.org, you get nothing returned, so this is an issue with the Biomart server, not the biomaRt package.

ADD COMMENTlink written 9 months ago by James W. MacDonald45k

@JamesW.McDonald If you look at the protein aliases in the Stringdb network (you can download the protein.aliases.v10.txt.gz file from the String db website) you can see that ENSP00000000233 maps to "ADP-RIBOSYLATION FACTOR 5 [*103188]" which is ARF5.

ADD REPLYlink modified 8 months ago • written 8 months ago by Poincare0

That may well be true. But I think there is a fundamental misunderstanding here. We are simply packaging the data that exist in a couple of databases in a way that makes it easier to use. In the case of the org.Hs.eg.db package, the central database used is the Gene DB from NCBI, so the data are necessarily Entrez Gene ID-centric.

While there are quite a few mappings from Entrez Gene -> Ensembl IDs, it's not unusual for there to be disagreements between the two annotation groups, and so it's not that unexpected that an annotation database built using NCBI IDs would not be completely comprehensive when trying to map IDs from a different annotation group.

You will find the same exact issues if you use biomaRt or one of the EnsDb packages to map Ensembl protein IDs to Entrez Gene IDs. There are lots of gaps. The best advice is to stay within the annotation group from which you got your IDs. So if you have Ensembl based IDs, use biomaRt or EnsDb packages to do the mapping. If you have UCSC or NCBI IDs, then use the orgDb or TxDb packages that Bioconductor core supply.

ADD REPLYlink written 8 months ago by James W. MacDonald45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 196 users visited in the last hour