Question: Problem with annotating ENSEMBLE IDs to GENE SYMBOL with AnnotationDBI mapIDs
gravatar for vd4mmind
25 days ago by
vd4mmind0 wrote:


I have some issues while using AnnotationDbi to convert my ENSEMBLE IDs to GENE SYMBOLS. I believe the below code should work but there are a lot of NA's I am getting , some of which I understand is fine but some do have gene symbols when I look in ENSEMBLE website for those ENSG ids but I am not getting them in new dataframe. Can you tell me what is the problem or to resolve it? I believe the libraries and all concerning databases are updated since I am using the latest builds and downloading them via bioClite. I also see that the mappings are 1:many and I was expecting it to be 1:1. Not able to understand how to get this resolved. Looked up a bunch of blog posts and nothing wrong in my code. What am I missing? Also the select() is not working for me and throws error. So how can this be done properly. I see over 200 IDs not having Gene Symbols and for e.g. ENSG00000239975 is having symbol IGKV1D-33 but with mapIDs this symbol is not there and I have NA's. So a lot more IDs are missing is my hunch. 

The object on which I am running this is deferentially expressed genes from edgeR output (topTags edgeR object). Its an object that is having just DEGs with my thresholds so the length is not all expressed genes but around ~2.6k genes. 

gene.ids <- AnnotationDbi::mapIds(, keys=rownames(degs.DN.v3),keytype="ENSEMBL", column="SYMBOL")
degs.DN.v3$table$genes <- data.frame(ENSEMBL=rownames(degs.DN.v3), SYMBOL=gene.ids)


R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] vivlib_1.0.0                devtools_1.13.5             geneplotter_1.58.0         
 [4] annotate_1.58.0             XML_3.98-1.11               Hmisc_4.1-1                
 [7] Formula_1.2-3               survival_2.41-3             lattice_0.20-35            
[10] plotly_4.7.1                Glimma_1.8.2                gridExtra_2.3              
[13] dplyr_0.7.5                 sva_3.28.0                  genefilter_1.62.0          
[16] mgcv_1.8-23                 nlme_3.1-137                edgeR_3.22.2               
[19] DESeq2_1.20.0               SummarizedExperiment_1.10.1 DelayedArray_0.6.0         
[22] BiocParallel_1.14.1         GenomicRanges_1.32.3        GenomeInfoDb_1.16.0        
[25] limma_3.36.1                BiocInstaller_1.30.0        pathview_1.20.0            
[28]          AnnotationDbi_1.42.1        IRanges_2.14.10            
[31] S4Vectors_0.18.2            Biobase_2.40.0              BiocGenerics_0.26.0        
[34] VennDiagram_1.6.20          futile.logger_1.4.3         GeneOverlap_1.16.0         
[37] DOSE_3.6.0                  clusterProfiler_3.8.1       dendsort_0.3.3             
[40] statmod_1.4.30              viridis_0.5.1               viridisLite_0.3.0          
[43] matrixStats_0.53.1          tibble_1.4.2                pheatmap_1.0.10            
[46] ggplot2_2.2.1               gplots_3.0.1                scales_0.5.0               
[49] RColorBrewer_1.1-2          reshape2_1.4.3              plyr_1.8.4                 
[52] tidyr_0.8.1                

loaded via a namespace (and not attached):
 [1] backports_1.1.2        fastmatch_1.1-0        igraph_1.2.1           lazyeval_0.2.1        
 [5] splines_3.5.0          digest_0.6.15          htmltools_0.3.6        GOSemSim_2.6.0        
 [9] GO.db_3.6.0            gdata_2.18.0           magrittr_1.5           checkmate_1.8.5       
[13] memoise_1.1.0          cluster_2.0.7-1        Biostrings_2.48.0      enrichplot_1.0.2      
[17] colorspace_1.3-2       blob_1.1.1             ggrepel_0.8.0          jsonlite_1.5          
[21] RCurl_1.95-4.10        graph_1.58.0           bindr_0.1.1            glue_1.2.0            
[25] gtable_0.2.0           zlibbioc_1.26.0        XVector_0.20.0         UpSetR_1.3.3          
[29] Rgraphviz_2.24.0       futile.options_1.0.1   DBI_1.0.0              Rcpp_0.12.17          
[33] xtable_1.8-2           htmlTable_1.12         units_0.5-1            foreign_0.8-70        
[37] bit_1.1-14             htmlwidgets_1.2        httr_1.3.1             fgsea_1.6.0           
[41] acepack_1.4.1          pkgconfig_2.0.1        nnet_7.3-12            locfit_1.5-9.1        
[45] labeling_0.3           tidyselect_0.2.4       rlang_0.2.1            munsell_0.4.3         
[49] tools_3.5.0            RSQLite_2.1.1          ggridges_0.5.0         stringr_1.3.1         
[53] yaml_2.1.19            knitr_1.20             bit64_0.9-7            caTools_1.17.1        
[57] purrr_0.2.5            KEGGREST_1.20.0        ggraph_1.0.1           bindrcpp_0.2.2        
[61] formatR_1.5            KEGGgraph_1.40.0       DO.db_2.9              compiler_3.5.0        
[65] rstudioapi_0.7         curl_3.2               png_0.1-7              tweenr_0.1.5          
[69] stringi_1.1.7          Matrix_1.2-14          pillar_1.2.3           data.table_1.11.4     
[73] cowplot_0.9.2          bitops_1.0-6           qvalue_2.12.0          R6_2.2.2              
[77] latticeExtra_0.6-28    KernSmooth_2.23-15     lambda.r_1.2.3         MASS_7.3-49           
[81] gtools_3.5.0           assertthat_0.2.0       withr_2.1.2            GenomeInfoDbData_1.1.0
[85] udunits2_0.13          rpart_4.1-13           rvcheck_0.1.0          git2r_0.21.0          
[89] ggforce_0.1.2          base64enc_0.1-3  
ADD COMMENTlink modified 25 days ago • written 25 days ago by vd4mmind0
gravatar for Aaron Lun
25 days ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

I daresay that this is because is based on Entrez gene identifiers. It's possible that genes that are only described in the Ensembl annotation will not be recorded in the object. If you're working with Ensembl identifiers, it makes more sense to use an Ensembl annotation object:

mapIds(EnsDb.Hsapiens.v86, keys="ENSG00000239975", keytype="GENEID", column="SYMBOL")
## ENSG00000239975 
##     "IGKV1D-33"

P.S. Your question has nothing to do with edgeR, as the above applies regardless of how you got your set of Ensembl IDs.

ADD COMMENTlink modified 25 days ago • written 25 days ago by Aaron Lun20k

Makes sense. Yes, these are entrezID based. Thanks a lot. Will take a look and try to use the ENSEMBLE library.

P.S: removed the edgeR tag. 

ADD REPLYlink modified 25 days ago • written 25 days ago by vd4mmind0

Perfect, works and yes it is ENSEMBLE annotation that I should be using for this project. I did not have that information from my collaborators , that is why I had the discrepancy. Thanks

ADD REPLYlink written 25 days ago by vd4mmind0

Just one more note on the EnsDb databases/packages. The package Aaron was using above is based on Ensembl release 86. You should make sure to use annotations from the same Ensembl release throughout your analysis. While the release 86 EnsDb is the most recent one provided as annotation package, you can get EnsDbs for more recent releases from AnnotationHub:

> library(AnnotationHub)
> query(AnnotationHub(), "EnsDb.Hsapiens")
snapshotDate(): 2018-05-18
AnnotationHub with 6 records
# snapshotDate(): 2018-05-18
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'

  AH53211 | Ensembl 87 EnsDb for Homo Sapiens
  AH53715 | Ensembl 88 EnsDb for Homo Sapiens
  AH56681 | Ensembl 89 EnsDb for Homo Sapiens
  AH57757 | Ensembl 90 EnsDb for Homo Sapiens
  AH60773 | Ensembl 91 EnsDb for Homo Sapiens
  AH60977 | Ensembl 92 EnsDb for Homo Sapiens
ADD REPLYlink written 24 days ago by Johannes Rainer1.3k

Sure thing, I have already intimated this to my collaborator about the usage of v86 , I will still confirm it once again. Thanks

ADD REPLYlink written 23 days ago by vd4mmind0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 132 users visited in the last hour