This doesn't require getLDS
, because that is intended for mapping between species. You just want to use regular getBM
instead. You might also try an EnsDb
> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> symb <- keys(org.Hs.eg.db, "SYMBOL")
> head(symb)
[1] "A1BG" "A2M" "A2MP1" "NAT1" "NAT2" "NATP"
> z <- getBM(c("ensembl_gene_id","hgnc_symbol"), "hgnc_symbol", symb, mart)
> head(z)
ensembl_gene_id hgnc_symbol
1 ENSG00000121410 A1BG
2 ENSG00000175899 A2M
3 ENSG00000256069 A2MP1
4 ENSG00000114771 AADAC
5 ENSG00000127837 AAMP
6 ENSG00000129673 AANAT
> dim(z)
[1] 44160 2
> length(symb)
[1] 66091
## you are right - not all symbols map. Let's try an EnsDb
> library(AnnotationHub)
Loading required package: BiocFileCache
Loading required package: dbplyr
Attaching package: 'AnnotationHub'
The following object is masked from 'package:Biobase':
cache
> hub <- AnnotationHub()
|======================================================================| 100%
snapshotDate(): 2022-04-21
> query(hub, c("homo sapiens","ensdb"))
AnnotationHub with 21 records
# snapshotDate(): 2022-04-21
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'
title
AH53211 | Ensembl 87 EnsDb for Homo Sapiens
AH53715 | Ensembl 88 EnsDb for Homo Sapiens
AH56681 | Ensembl 89 EnsDb for Homo Sapiens
AH57757 | Ensembl 90 EnsDb for Homo Sapiens
AH60773 | Ensembl 91 EnsDb for Homo Sapiens
... ...
AH89180 | Ensembl 102 EnsDb for Homo sapiens
AH89426 | Ensembl 103 EnsDb for Homo sapiens
AH95744 | Ensembl 104 EnsDb for Homo sapiens
AH98047 | Ensembl 105 EnsDb for Homo sapiens
AH100643 | Ensembl 106 EnsDb for Homo sapiens
> ensdb <- hub[["AH100643"]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
loading from cache
require("ensembldb")
> zz <- select(ensdb, symb, "GENEID", "GENENAME")
> head(zz)
GENENAME GENEID
1 A1BG ENSG00000121410
2 A2M ENSG00000175899
3 A2M LRG_591
4 A2MP1 ENSG00000256069
5 NAT1 ENSG00000171428
6 NAT2 ENSG00000156006
> dim(zz)
[1] 45707 2
## Still not 100% mapping
This is a mapping between HGNC and EBI/EMBL, and not all genes that have symbols are thought to be genes by EBI/EMBL. As an example
> head(symb[!symb %in% z[,2]])
[1] "AAVS1" "ACLS" "ACTBP3" "ACTG1P6" "ACTG1P7" "ACTG1P8"
And AAVS1 has no Ensembl Gene ID, but for example A1BG does.