Question: biomaRt Ensembl gene ID to multiple HGNC symbol
0
gravatar for foehn
4 months ago by
foehn60
foehn60 wrote:

Hi,

I'm using R package biomaRt to map Ensembl gene IDs to HGNC symbols. I find some Ensembl IDs can be mapped to multiple symbols. For example,

mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = c("ENSG00000187510", "ENSG00000230417", "ENSG00000276085"), mart = mart)     
  ensembl_gene_id hgnc_symbol
1 ENSG00000187510    C12orf74
2 ENSG00000187510     PLEKHG7
3 ENSG00000230417   LINC00595
4 ENSG00000230417   LINC00856
5 ENSG00000276085      CCL3L1
6 ENSG00000276085      CCL3L3

> packageVersion("biomaRt")
[1] ‘2.38.0’

This is unsurprising given that we don't expect 1:1 map. However, what is confusing is that, if I query those IDs with Ensembl website, I will get unambiguously one symbol. That is,

ENSG00000187510 -> C12orf74
ENSG00000230417 -> LINC00856
ENSG00000276085 -> CCL3L1

In theory, what is behind biomaRt is just SQL query against Ensembl database online, and we should expect same results given the same version of the database. So I want to know why we get this discrepancy.

Thanks,

biomart ensembl symbol hgnc • 273 views
ADD COMMENTlink modified 4 months ago by James W. MacDonald51k • written 4 months ago by foehn60
Answer: biomaRt Ensembl gene ID to multiple HGNC symbol
3
gravatar for James W. MacDonald
4 months ago by
United States
James W. MacDonald51k wrote:

When you map to the HGNC symbol, you are asking for an external reference. In other words, what symbols does the HUGO consortium say map to this Ensembl ID, which you can see here, and which include both of the symbols you get from the Biomart server.

ADD COMMENTlink written 4 months ago by James W. MacDonald51k

Right. But I'm curious about why http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000230417;r=10:78179185-78551355 returns LINC00856 as the Name in Summary section. Does it imply that Ensembl regards LINC00856 as a more canonical symbol than the other?

ADD REPLYlink written 4 months ago by foehn60
1

That's a question for EBI/EMBL, no? I'm not sure why you would think anybody at the Bioconductor support site would have any particular insight as to their thinking about what symbol is more or less canonical than any other.

ADD REPLYlink written 4 months ago by James W. MacDonald51k

Good idea.

ADD REPLYlink written 4 months ago by foehn60
2

According to Ensembl's reply, they arbitrarily pick a HGNC synonym for the summary if multiple.

ADD REPLYlink written 4 months ago by foehn60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 158 users visited in the last hour