biomartRt: convert mouse gene symbol return multiple human gene symbol
Entering edit mode
chang02_23 ▴ 20
Last seen 3.5 years ago
United States

I notice that some mouse symbol will return multiple human gene symbol. Below is an example. If i search the mouse id on gene card, the correct human homolog should be ZNF286A, and Tmx2 is TMX2.   

Is there a programmatic way to remove the incorrect conversion?

human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
genes = c("Zfp286", "Tmx2")
genes = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = genes ,mart = mouse, attributesL = c("hgnc_symbol","chromosome_name", "start_position"), martL = human, uniqueRows=T)
> genes
MGI.symbol HGNC.symbol Chromosome.Name Gene.Start..bp.
1     Zfp286     ZNF286A              17        15699577
2     Zfp286     ZNF286B              17        18658429
3       Tmx2        TMX2              11        57712600
4       Tmx2 TMX2-CTNND1              11        57712605

biomart gene symbol • 9.7k views
Entering edit mode

Why do you say the correct homologs are those? Why is ZNF286B not an homolog of Zfp286?

Entering edit mode

Based on the MGI database, the listed homolog is ZNF286A.

Entering edit mode

Actually, I decided to add this as an answer.

Entering edit mode
Diego Diez ▴ 760
Last seen 11 months ago

Homology is defined in terms of ancestry and usually determined using sequence similarity (and there are many different methods to do so). Just because one sites says two genes are homologs it does not necessarily make it true. There are at least two possibilities that come to my mind (see also trees below):

  1. First speciation then duplication. In that case a version of the Znf286 gene existed in an ancestral species before the mouse/human split, so each species contained one copy. Then the human one duplicated. In that sense Znf286 in mouse in ortholog of both human genes (this is the information shown in Ensembl).
  2. Alternatively, Znf286 duplicated in the ancestral species. Therefore after the mouse/human split we would have two copies in each species, but one of the mouse copies was lost. Then the surviving mouse copy would be ortholog of one of the human genes only (maybe the information shown in the page you linked).
First hypothesis:

         |--Zf (mouse)
         |               |--ZfA
         |--Zf (human) --o
Second hypothesis:

                 |--ZfA (mouse) --d (e.g. this later was lost)
         |       |--ZfB (human)
         |       |--ZfB (mouse)
                 |--ZfB (human)

x: speciation event
o: duplication event
d: extinction event


Not completely sure how which of the two options is figured out, but probably involves using information about other species. For example, if Zfp286 duplicated before the mouse/human split (option 2), then a duplication would be found in many other species (unless massive gene loss). If instead the human version duplicated after the split (option 1) then most species would have just a 1-1 orthology relation. This option is indeed what the orthologs page for this gene in Ensembl shows (here).

You can look at the information and decide which one you prefer to believe. Many of these mappings are done automatically by software and if you really care about this homology relation you may want to run more specific analyses yourself. HTH.


Login before adding your answer.

Traffic: 270 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6