Get the latest gene name from Entrez ID
2
0
Entering edit mode
sgupt46 • 0
@sgupt46-13716
Last seen 9 months ago
Canada

Is there a way to get latest gene Symbol or Entrez ID in R? I am using AnnotationDbi and org.Hs.eg.db but it seems to give old gene name. For entrez ID 64755, the new gene name is RUSF1 but it gives C16orf58

library(AnnotationDbi)
library(org.Hs.eg.db)
AnnotationDbi::select(org.Hs.eg.db, "64755", "SYMBOL", "ENTREZID")
  ENTREZID   SYMBOL
1    64755 C16orf58

sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] org.Hs.eg.db_3.11.4  AnnotationDbi_1.50.3 IRanges_2.22.2      
[4] S4Vectors_0.26.1     Biobase_2.50.0       BiocGenerics_0.36.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6      DBI_1.1.1       RSQLite_2.2.2   rlang_0.4.10   
 [5] cachem_1.0.4    blob_1.2.1      vctrs_0.3.6     bit64_4.0.5    
 [9] bit_4.0.4       fastmap_1.1.0   compiler_4.0.3  pkgconfig_2.0.3
[13] memoise_2.0.0
AnnotationDbi org.Hs.eg.db • 2.4k views
ADD COMMENT
0
Entering edit mode

Thanks guys. I found that BioMart can also do this.

library(biomaRt)
mart <- useMart(dataset="hsapiens_gene_ensembl",biomart='ensembl')
biomaRt::select(mart, keys="64755", columns=c('hgnc_symbol'), keytype='entrezgene_id')
 hgnc_symbol
1       RUSF1
ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 8 minutes ago
WEHI, Melbourne, Australia

Short Answer

Use the current version of Bioconductor. You're using Bioconductor 3.11 instead of Bioconductor 3.12 so obviously you'll get gene annotation from the time of Bioconductor 3.11 (which was April 2020). The gene you show has changed its name since then. The current version of org.Hs.eg.db has the new name.

Longer Answer

The organism packages (such as org.Hs.eg.db) are updated every six months at the time of the Bioconductor releases. If you instead want gene symbols that are up to date to the day, then download the gene_info file from NCBI (from which org.Hs.eg.db is created). You could also use alias2SymbolUsingNCBI in the limma package.

You can get the gene_info file from

https://www.ncbi.nlm.nih.gov/public/

The file itself is

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz

ADD COMMENT
1
Entering edit mode
shepherl 3.8k
@lshep
Last seen 7 minutes ago
United States

You could use AnnotationHub resources. There is a package AHEnsDbs that provides recent ensembl annotations which include entrezid.

> library(AnnotationHub)
> ah = AnnotationHub()
> query(ah, c("org", "homo"))
> query(ah, c("AHEnsDbs", "homo"))
AnnotationHub with 18 records
# snapshotDate(): 2021-03-15
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53211"]]' 

            title                             
  AH53211 | Ensembl 87 EnsDb for Homo Sapiens 
  AH53715 | Ensembl 88 EnsDb for Homo Sapiens 
  AH56681 | Ensembl 89 EnsDb for Homo Sapiens 
  AH57757 | Ensembl 90 EnsDb for Homo Sapiens 
  AH60773 | Ensembl 91 EnsDb for Homo Sapiens 
  ...       ...                               
  AH78783 | Ensembl 99 EnsDb for Homo sapiens 
  AH79689 | Ensembl 100 EnsDb for Homo sapiens
  AH83216 | Ensembl 101 EnsDb for Homo sapiens
  AH89180 | Ensembl 102 EnsDb for Homo sapiens
  AH89426 | Ensembl 103 EnsDb for Homo sapiens
> temp = ah[["AH89426"]]
loading from cache
require("ensembldb")
> select(temp, "64755", "SYMBOL", "ENTREZID")
  ENTREZID SYMBOL
1    64755  RUSF1
ADD COMMENT

Login before adding your answer.

Traffic: 785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6