Package annotate and org.Hs.eg.db: can't look up ENTREZ by ENSEMBL gene
1
0
Entering edit mode
@vitalina-komashko-1048
Last seen 4.9 years ago

It seems to me that using annotate::lookUp used to work for mapping ENSEMBL genes to ENTREZ IDs. I used to use it in the following way:

annotate::lookUp("ENSG00000121410", data = "org.Hs.eg", what = "ENSEMBL")

However, it seems that it doesn't work anymore for me even though the mapping is definitely available.

Verify mapping availability:

> temp <- toTable(org.Hs.egENSEMBL)
> head(temp)
  gene_id      ensembl_id
1       1 ENSG00000121410
2       2 ENSG00000175899
3       3 ENSG00000256069
4       9 ENSG00000171428
5      10 ENSG00000156006
6      12 ENSG00000196136

Take the first ENSEMBL gene from temp to use with annotate::lookUp (it should return 1, however, it returns NA):

> annotate::lookUp("ENSG00000121410", data = "org.Hs.eg", what = "ENSEMBL")
$ENSG00000121410
[1] NA

Lookup by ENTREZ actually works:

> annotate::lookUp("1", data = "org.Hs.eg", what = "ENSEMBL")
$`1`
[1] "ENSG00000121410"

Look up by Gene Symbol to get ENTREZ IDs using ALIAS2EG works!

> annotate::lookUp("STPG1", data = "org.Hs.eg", what = "ALIAS2EG")
$STPG1
[1] "90529"

I can obtain mapping from ENSEMBL to ENTREZ using mapIds:

> AnnotationDbi::mapIds(org.Hs.eg.db, key = "ENSG00000121410", column = "ENTREZID", keytype = "ENSEMBL")
'select()' returned 1:1 mapping between keys and columns
ENSG00000121410 
            "1"

What am I doing wrong? I have built a package that relies on annotate::lookUp and it works for gene symbols, but not for ensembl genes.

Thank you very much for your time and help!

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Hs.eg.db_3.8.2   annotate_1.62.0      XML_3.98-1.19        AnnotationDbi_1.46.0
[5] IRanges_2.18.0       S4Vectors_0.22.0     Biobase_2.44.0       BiocGenerics_0.30.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1      digest_0.6.18   bitops_1.0-6    xtable_1.8-4    DBI_1.0.0       RSQLite_2.1.1  
 [7] blob_1.1.1      tools_3.6.0     bit64_0.9-7     RCurl_1.95-4.12 bit_1.1-14      compiler_3.6.0 
[13] pkgconfig_2.0.2 memoise_1.1.0 
annotation • 560 views
ADD COMMENT
0
Entering edit mode

Thank you, James! You are right, this does work:

> lookUp("ENSG00000121410", "org.Hs.eg", "ENSEMBL2EG")
$ENSG00000121410
[1] "1"

but I thought that I was supposed to use the string from the package, like here: org.Hs.egENSEMBL, so I would take "ENSEMBL"...

Anyways, not arguing! :) I will take your advice and convert my code to mapIds. Thank you very much!

ADD REPLY
0
Entering edit mode

Unless you are adding an answer, please use the ADD COMMENT or ADD REPLY buttons.

Most of the BiMaps that are available in a package are central key -> other thing. So org.Hs.egENSEMBL is a mapping from Gene ID (the central key) -> Ensembl ID (the other thing). There is a function called revmap that reverses that mapping:


> get("ENSG00000121410", org.Hs.egENSEMBL)
Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : 
  value for "ENSG00000121410" not found
> get("ENSG00000121410", revmap(org.Hs.egENSEMBL))
[1] "1"

Anything with a 2EG at the end of its name is a pre-formed revmap of an existing BiMap that you could hypothetically use.

> grep("2EG$", ls(2), value = TRUE)
 [1] "org.Hs.egACCNUM2EG"       "org.Hs.egALIAS2EG"       
 [3] "org.Hs.egENSEMBL2EG"      "org.Hs.egENSEMBLPROT2EG" 
 [5] "org.Hs.egENSEMBLTRANS2EG" "org.Hs.egENZYME2EG"      
 [7] "org.Hs.egGO2EG"           "org.Hs.egMAP2EG"         
 [9] "org.Hs.egOMIM2EG"         "org.Hs.egPATH2EG"        
[11] "org.Hs.egPMID2EG"         "org.Hs.egREFSEQ2EG"      
[13] "org.Hs.egSYMBOL2EG"       "org.Hs.egUNIGENE2EG" 

But it seems to me that it's just way easier (programmatically) to get mappings using select or mapIds rather than trying to figure out which BiMap you need for a given mapping.

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 18 hours ago
United States

You should convert to using either mapIds or select rather than lookUp. The annotate package was built back in the day when the annotation objects were based on R environments, and has been converted to be able to work with the SQLite-based annotation packages we currently use. But the AnnotationDbi package (and friends) are intended to be the successor to annotate, and were built specifically to interact with the SQLite-based annotation packages.

It doesn't look like lookUp is doing the right thing here, and maybe something needs to be fixed. However, annotate is old tech, and we do have a set of functions that do what they are supposed to do, so there isn't much impetus to make sure annotate works correctly, not to mention a spare person to put in the work to fix something that by all rights should be deprecated...

Anyway, what lookUp is intended to do is call mget to get the ID you want. It seems to not be correctly finding the right BiMap, however, as the direct call does work.


> mget("ENSG00000121410", org.Hs.egENSEMBL2EG)
$ENSG00000121410
[1] "1"

Anyway, you are better off using either select or mapIds, or even mget if you want to keep kicking it old school.

ADD COMMENT
0
Entering edit mode

Or maybe it is working correctly, and you are using the wrong thing for the 'what' argument?

> lookUp("ENSG00000121410", "org.Hs.eg", "ENSEMBL2EG")
$ENSG00000121410
[1] "1"

I haven't used lookUp in years, so maybe that's it.

ADD REPLY

Login before adding your answer.

Traffic: 791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6