I'm using R package
biomaRt to map Ensembl gene IDs to HGNC symbols. I find some Ensembl IDs can be mapped to multiple symbols. For example,
mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl") getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = c("ENSG00000187510", "ENSG00000230417", "ENSG00000276085"), mart = mart) ensembl_gene_id hgnc_symbol 1 ENSG00000187510 C12orf74 2 ENSG00000187510 PLEKHG7 3 ENSG00000230417 LINC00595 4 ENSG00000230417 LINC00856 5 ENSG00000276085 CCL3L1 6 ENSG00000276085 CCL3L3 > packageVersion("biomaRt")  ‘2.38.0’
This is unsurprising given that we don't expect 1:1 map. However, what is confusing is that, if I query those IDs with Ensembl website, I will get unambiguously one symbol. That is,
ENSG00000187510 -> C12orf74 ENSG00000230417 -> LINC00856 ENSG00000276085 -> CCL3L1
In theory, what is behind
biomaRt is just SQL query against Ensembl database online, and we should expect same results given the same version of the database. So I want to know why we get this discrepancy.