Hi,
I have some issues while using AnnotationDbi to convert my ENSEMBLE IDs to GENE SYMBOLS. I believe the below code should work but there are a lot of NA's I am getting , some of which I understand is fine but some do have gene symbols when I look in ENSEMBLE website for those ENSG ids but I am not getting them in new dataframe. Can you tell me what is the problem or to resolve it? I believe the libraries and all concerning databases are updated since I am using the latest builds and downloading them via bioClite. I also see that the mappings are 1:many and I was expecting it to be 1:1. Not able to understand how to get this resolved. Looked up a bunch of blog posts and nothing wrong in my code. What am I missing? Also the select() is not working for me and throws error. So how can this be done properly. I see over 200 IDs not having Gene Symbols and for e.g. ENSG00000239975 is having symbol IGKV1D-33 but with mapIDs this symbol is not there and I have NA's. So a lot more IDs are missing is my hunch.
The object on which I am running this is deferentially expressed genes from edgeR output (topTags edgeR object). Its an object that is having just DEGs with my thresholds so the length is not all expressed genes but around ~2.6k genes.
degs.DN.v3=degs.DN.v1 gene.ids <- AnnotationDbi::mapIds(org.Hs.eg.db, keys=rownames(degs.DN.v3),keytype="ENSEMBL", column="SYMBOL") degs.DN.v3$table$genes <- data.frame(ENSEMBL=rownames(degs.DN.v3), SYMBOL=gene.ids) sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 grid stats graphics grDevices utils datasets methods base other attached packages: [1] vivlib_1.0.0 devtools_1.13.5 geneplotter_1.58.0 [4] annotate_1.58.0 XML_3.98-1.11 Hmisc_4.1-1 [7] Formula_1.2-3 survival_2.41-3 lattice_0.20-35 [10] plotly_4.7.1 Glimma_1.8.2 gridExtra_2.3 [13] dplyr_0.7.5 sva_3.28.0 genefilter_1.62.0 [16] mgcv_1.8-23 nlme_3.1-137 edgeR_3.22.2 [19] DESeq2_1.20.0 SummarizedExperiment_1.10.1 DelayedArray_0.6.0 [22] BiocParallel_1.14.1 GenomicRanges_1.32.3 GenomeInfoDb_1.16.0 [25] limma_3.36.1 BiocInstaller_1.30.0 pathview_1.20.0 [28] org.Hs.eg.db_3.6.0 AnnotationDbi_1.42.1 IRanges_2.14.10 [31] S4Vectors_0.18.2 Biobase_2.40.0 BiocGenerics_0.26.0 [34] VennDiagram_1.6.20 futile.logger_1.4.3 GeneOverlap_1.16.0 [37] DOSE_3.6.0 clusterProfiler_3.8.1 dendsort_0.3.3 [40] statmod_1.4.30 viridis_0.5.1 viridisLite_0.3.0 [43] matrixStats_0.53.1 tibble_1.4.2 pheatmap_1.0.10 [46] ggplot2_2.2.1 gplots_3.0.1 scales_0.5.0 [49] RColorBrewer_1.1-2 reshape2_1.4.3 plyr_1.8.4 [52] tidyr_0.8.1 loaded via a namespace (and not attached): [1] backports_1.1.2 fastmatch_1.1-0 igraph_1.2.1 lazyeval_0.2.1 [5] splines_3.5.0 digest_0.6.15 htmltools_0.3.6 GOSemSim_2.6.0 [9] GO.db_3.6.0 gdata_2.18.0 magrittr_1.5 checkmate_1.8.5 [13] memoise_1.1.0 cluster_2.0.7-1 Biostrings_2.48.0 enrichplot_1.0.2 [17] colorspace_1.3-2 blob_1.1.1 ggrepel_0.8.0 jsonlite_1.5 [21] RCurl_1.95-4.10 graph_1.58.0 bindr_0.1.1 glue_1.2.0 [25] gtable_0.2.0 zlibbioc_1.26.0 XVector_0.20.0 UpSetR_1.3.3 [29] Rgraphviz_2.24.0 futile.options_1.0.1 DBI_1.0.0 Rcpp_0.12.17 [33] xtable_1.8-2 htmlTable_1.12 units_0.5-1 foreign_0.8-70 [37] bit_1.1-14 htmlwidgets_1.2 httr_1.3.1 fgsea_1.6.0 [41] acepack_1.4.1 pkgconfig_2.0.1 nnet_7.3-12 locfit_1.5-9.1 [45] labeling_0.3 tidyselect_0.2.4 rlang_0.2.1 munsell_0.4.3 [49] tools_3.5.0 RSQLite_2.1.1 ggridges_0.5.0 stringr_1.3.1 [53] yaml_2.1.19 knitr_1.20 bit64_0.9-7 caTools_1.17.1 [57] purrr_0.2.5 KEGGREST_1.20.0 ggraph_1.0.1 bindrcpp_0.2.2 [61] formatR_1.5 KEGGgraph_1.40.0 DO.db_2.9 compiler_3.5.0 [65] rstudioapi_0.7 curl_3.2 png_0.1-7 tweenr_0.1.5 [69] stringi_1.1.7 Matrix_1.2-14 pillar_1.2.3 data.table_1.11.4 [73] cowplot_0.9.2 bitops_1.0-6 qvalue_2.12.0 R6_2.2.2 [77] latticeExtra_0.6-28 KernSmooth_2.23-15 lambda.r_1.2.3 MASS_7.3-49 [81] gtools_3.5.0 assertthat_0.2.0 withr_2.1.2 GenomeInfoDbData_1.1.0 [85] udunits2_0.13 rpart_4.1-13 rvcheck_0.1.0 git2r_0.21.0 [89] ggforce_0.1.2 base64enc_0.1-3
Makes sense. Yes, these are entrezID based. Thanks a lot. Will take a look and try to use the ENSEMBLE library.
P.S: removed the edgeR tag.
Perfect, works and yes it is ENSEMBLE annotation that I should be using for this project. I did not have that information from my collaborators , that is why I had the discrepancy. Thanks
Just one more note on the
EnsDb
databases/packages. The package Aaron was using above is based on Ensembl release 86. You should make sure to use annotations from the same Ensembl release throughout your analysis. While the release 86EnsDb
is the most recent one provided as annotation package, you can get EnsDbs for more recent releases fromAnnotationHub
:Sure thing, I have already intimated this to my collaborator about the usage of v86 , I will still confirm it once again. Thanks