Hi Marc and others,
Recently a funny entry popup in org.Hs.eg.db during some internal mapping checkup,
library(org.Hs.eg.db) select(org.Hs.eg.db, keys='16592263', columns=c('SYMBOL','ENTREZID'), keytype ='ENSEMBL') # ENSEMBL SYMBOL ENTREZID #1 16592263 RPL21P28 100131205 select(org.Hs.eg.db, keys='100131205', columns=c('SYMBOL','ENSEMBL')) # ENTREZID SYMBOL ENSEMBL #1 100131205 RPL21P28 16592263 #2 100131205 RPL21P28 ENSG00000220749 #3 100131205 RPL21P28 ENSG00000213860 # only in 3.0.0, removed now grep('ENSG', keys(org.Hs.eg.db, keytype = 'ENSEMBL'), invert=T, value=T) #16592263
I guess this is more a problem upstream from NCBI (http://www.ncbi.nlm.nih.gov/gene/100131205, fixed now), where this 16592263 is coming from. So I wonder if it is possible to enforce some check on the ensemblID (Is it TRUE that all Ensembl gene id starts with ENS???scratching my head now). The good thing is that this is quite a unique case, and removing it is easy while nothing is lost. This is true for at least 2 versions (org.Hs.eg.db_3.0.0, org.Hs.eg.db_3.1.2) I have checked, so it has been there for a while.
We are reporting this to NCBI as well... some more affected people...