Since this is my first time posting a question, I'd appreciate it if you could answer politely.
I have a problem translating Ensembl ID to Entrez Gene ID. I've tried the script provided below, but although it runs without errors, it only returns "NA" in the entrezgene_id column. What might be the issue?
The genes you present are orthologous mappings from D. rerio to other teleost fishes, so it is probably going to be difficult to map to NCBI Gene IDs. As an example, the first gene is etv5b, and if you search on NCBI for that, Stickleback isn't even listed. Or an even more directed search results in nothing.
My general rule is that you should never try to map between Ensembl and NCBI IDs unless absolutely necessary, because there are any number of reasons why what appears to be a simple mapping is not simple at all.
I believe you need NCBI Gene IDs for KEGG, in which case you may need to map. The three genes you have shown here don't map, and of those three, all appear to be either orthologs of D. rerio or H. sapiens. As an example,
> library(AnnotationHub)
> hub <- AnnotationHub()
> zz <- hub[["AH116275"]]
downloading 1 resources
retrieving 1 resource
|===========================| 100%
loading from cache
require("ensembldb")
Warning message:
package 'GenomeInfoDb' was built under R version 4.3.2
> zz
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.10
|Creation time: Mon Jan 15 16:00:24 2024
|ensembl_version: 111
|ensembl_host: localhost
|Organism: Gasterosteus aculeatus
|taxonomy_id: 69293
|genome_build: BROADS1
|DBSCHEMAVERSION: 2.2
|common_name: three-spined stickleback
|species: gasterosteus_aculeatus
| No. of genes: 22456.
| No. of transcripts: 29245.
|Protein data available.
> select(zz, genes, c("GENEID","SYMBOL","ENTREZID"))
GENEID SYMBOL ENTREZID
1 ENSGACG00000014473 etv5b NA
2 ENSGACG00000015168 NA
3 ENSGACG00000007529 CNNM1 NA
> gns2 <- tolower(mapIds(zz, genes, "SYMBOL","GENEID"))
> gns2
ENSGACG00000014473
"etv5b"
ENSGACG00000015168
""
ENSGACG00000007529
"cnnm1"
> library(org.Dr.eg.db)
> select(org.Dr.eg.db, gns2, "ENTREZID", "SYMBOL")
'select()' returned 1:1 mapping
between keys and columns
SYMBOL ENTREZID
1 etv5b 30452
2 <NA>
3 cnnm1 562504
And then maybe you could do the KEGG analysis based on D. rerio instead?
Can you provide an example of the Ensemble IDs you're trying to convert?
Thank you for the reply. The examples are these.
ENSGACG00000014473 ENSGACG00000015168 ENSGACG00000007529