Question: biomaRt Ensembl gene ID to multiple HGNC symbol
0
6 weeks ago by
foehn60
foehn60 wrote:

Hi,

I'm using R package biomaRt to map Ensembl gene IDs to HGNC symbols. I find some Ensembl IDs can be mapped to multiple symbols. For example,

mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = c("ENSG00000187510", "ENSG00000230417", "ENSG00000276085"), mart = mart)
ensembl_gene_id hgnc_symbol
1 ENSG00000187510    C12orf74
2 ENSG00000187510     PLEKHG7
3 ENSG00000230417   LINC00595
4 ENSG00000230417   LINC00856
5 ENSG00000276085      CCL3L1
6 ENSG00000276085      CCL3L3

> packageVersion("biomaRt")
[1] ‘2.38.0’


This is unsurprising given that we don't expect 1:1 map. However, what is confusing is that, if I query those IDs with Ensembl website, I will get unambiguously one symbol. That is,

ENSG00000187510 -> C12orf74
ENSG00000230417 -> LINC00856
ENSG00000276085 -> CCL3L1


In theory, what is behind biomaRt is just SQL query against Ensembl database online, and we should expect same results given the same version of the database. So I want to know why we get this discrepancy.

Thanks,

biomart ensembl symbol hgnc • 113 views
modified 6 weeks ago by James W. MacDonald50k • written 6 weeks ago by foehn60
Answer: biomaRt Ensembl gene ID to multiple HGNC symbol
3
6 weeks ago by
United States
James W. MacDonald50k wrote:

When you map to the HGNC symbol, you are asking for an external reference. In other words, what symbols does the HUGO consortium say map to this Ensembl ID, which you can see here, and which include both of the symbols you get from the Biomart server.

Right. But I'm curious about why http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000230417;r=10:78179185-78551355 returns LINC00856 as the Name in Summary section. Does it imply that Ensembl regards LINC00856 as a more canonical symbol than the other?

1

That's a question for EBI/EMBL, no? I'm not sure why you would think anybody at the Bioconductor support site would have any particular insight as to their thinking about what symbol is more or less canonical than any other.

Good idea.

2

According to Ensembl's reply, they arbitrarily pick a HGNC synonym for the summary if multiple.