using biomaRt to look up human gene symbols and map them to human ensembl ENSG IDs
Entering edit mode
Last seen 4 months ago
United States

hello bioconductor community,

firstly, to those who did - thanks for developing and maintaining biomaRt - i use it often and its a great resource.

recently, i have been using biomaRt to look up human gene symbols from public RNAseq data and map them to human ENSG IDs. however, when i try to look up a list of about 30k symbols, only about 20k find matches using the getLDS() function as below. Do you know why this might be? it may look a little odd as i am designing it to look up gene symbols across species.

mrt = useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
tbl.match = getLDS(attributes = "ensembl_gene_id", mart = mrt, filters = "ensembl_gene_id", values = HumanGeneSymbolsFromRNAseq, martL = mrt)

it appears that most of the "conventional", well-studied/named genes map, but there are many non-coding and pseudo genes and others that have gene symbols that do not appear in the biomaRt lookup - are there different versions of gene symbols that researchers may be using that do not map in biomaRt?

thanks for any insight you might have,


biomaRt • 503 views
Entering edit mode
Last seen 2 hours ago
United States

This doesn't require getLDS, because that is intended for mapping between species. You just want to use regular getBM instead. You might also try an EnsDb

> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> symb <- keys(, "SYMBOL")
> head(symb)
[1] "A1BG"  "A2M"   "A2MP1" "NAT1"  "NAT2"  "NATP" 
> z <- getBM(c("ensembl_gene_id","hgnc_symbol"), "hgnc_symbol", symb, mart)
> head(z)
  ensembl_gene_id hgnc_symbol
1 ENSG00000121410        A1BG
2 ENSG00000175899         A2M
3 ENSG00000256069       A2MP1
4 ENSG00000114771       AADAC
5 ENSG00000127837        AAMP
6 ENSG00000129673       AANAT
> dim(z)
[1] 44160     2
> length(symb)
[1] 66091

## you are right - not all symbols map. Let's try an EnsDb
> library(AnnotationHub)
Loading required package: BiocFileCache
Loading required package: dbplyr

Attaching package: 'AnnotationHub'

The following object is masked from 'package:Biobase':


> hub <- AnnotationHub()
  |======================================================================| 100%

snapshotDate(): 2022-04-21
> query(hub, c("homo sapiens","ensdb"))
AnnotationHub with 21 records
# snapshotDate(): 2022-04-21
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53211"]]' 

  AH53211  | Ensembl 87 EnsDb for Homo Sapiens 
  AH53715  | Ensembl 88 EnsDb for Homo Sapiens 
  AH56681  | Ensembl 89 EnsDb for Homo Sapiens 
  AH57757  | Ensembl 90 EnsDb for Homo Sapiens 
  AH60773  | Ensembl 91 EnsDb for Homo Sapiens 
  ...        ...                               
  AH89180  | Ensembl 102 EnsDb for Homo sapiens
  AH89426  | Ensembl 103 EnsDb for Homo sapiens
  AH95744  | Ensembl 104 EnsDb for Homo sapiens
  AH98047  | Ensembl 105 EnsDb for Homo sapiens
  AH100643 | Ensembl 106 EnsDb for Homo sapiens
> ensdb <- hub[["AH100643"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
> zz <- select(ensdb, symb, "GENEID", "GENENAME")
> head(zz)
1     A1BG ENSG00000121410
2      A2M ENSG00000175899
3      A2M         LRG_591
4    A2MP1 ENSG00000256069
5     NAT1 ENSG00000171428
6     NAT2 ENSG00000156006
> dim(zz)
[1] 45707     2

## Still not 100% mapping

This is a mapping between HGNC and EBI/EMBL, and not all genes that have symbols are thought to be genes by EBI/EMBL. As an example

> head(symb[!symb %in% z[,2]])
[1] "AAVS1"   "ACLS"    "ACTBP3"  "ACTG1P6" "ACTG1P7" "ACTG1P8"

And AAVS1 has no Ensembl Gene ID, but for example A1BG does.

Login before adding your answer.

Traffic: 510 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6