BioMart does not find mouse homologs of human genes
1
0
Entering edit mode
atakanekiz ▴ 30
@atakanekiz-15874
Last seen 6 months ago

Hello BC community,

I am trying to convert a list of human genes to mouse homologs using R. Biomart finds homologs of some genes but not others. I can't seem to figure out the reason for this behavior. An example code is below.

human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")

getLDS(attributes = c("external_gene_name"),
filters = "external_gene_name", values = c("TP53", "MIR155HG", "STAT1", "PDCD1"), mart = human,
attributesL = c("external_gene_name"), martL = mouse)

#  Gene.name Gene.name.1
#1      TP53       Trp53
#2     PDCD1       Pdcd1
#3     STAT1       Stat1


I tried the query using other arguments such as ensembl_gene_id and hgnc_symbol and the results were the same. I know that MIR155HG should be conserved between human and mouse. This is confirmed by Mir155hg Ensembl page. I have several other genes like this that don't get mapped to the mouse genome for some reason.

What am I missing here?

biomart homolog ortholog gene conversion ensembl • 732 views
0
Entering edit mode

Where on the Ensembl page does it show that MIR155HG is conserved between human and mouse? When I look at the orthologues page it suggests that there are none in any of the 27 primate species - perhaps I'm reading that wrong.

0
Entering edit mode

We work on this gene in both human and mouse models. Below is the mouse entry in ensembl:

http://uswest.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000097418;r=16:84703167-84715245

0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 minutes ago
United States

The biomaRt package is simply a way to programmatically access the Biomart server and get the results back into R. As such, anything like this is really a Biomart issue, not biomaRt (or Bioconductor, really). Anyway, there is a FAQ

0
Entering edit mode

Thanks for the answer. I thought something may be going wrong in the biomaRt rather than the actual database. I'll keep digging to see what's up.

0
Entering edit mode

If you look at the link you show above, clicking on orthologs, it says there aren't any human orthologs for that miRNA! The only orthologs are for three other mice species. This is what Mike Smith pointed out. And if you look at the link for MIR155HG, the orthologs link isn't available, which I imagine means there are none.

This is a pretty strong indication (to me, anyway) that Ensembl doesn't think MIR155HG and Mir155hg are orthologs. NCBI has other views on that subject, however.

0
Entering edit mode

Wow, interesting! You are right, it looks like Ensembl and NCBI don't agree on this. I will side with NCBI in this case. Do you have a recommendation on how to find orthologs in an all-inclusive manner? I thought Ensembl was the most comprehensive one, but I may be wrong based on this experience.

0
Entering edit mode

There's no such thing. NCBI probably has a pretty good rationale for why they think MIR155HG and Mir155hg are orthologs, and I would bet EBI/EMBL has a good rationale for why they think they aren't. And I would also bet that their rationales hinge on pretty subtle, sophisticated points where reasonable people could see both sides and in the end you just have to make a decision as to what you, as a group, are going to do.

This scenario most assuredly propagates through to hundreds if not thousands of genes, where the two groups have landed on different sides of the argument, giving rise to many many differences in what NCBI and EBI/EMBL think are and are not orthologs.

The fact that the two groups don't agree on everything doesn't mean one is right and the other is wrong! They just disagree, based on the given evidence and whatever rules they have instituted to help make decisions when the answer isn't obvious.

Because of that, there isn't a way to find orthologs in an all-inclusive manner. If a gene is 80% homologous in two species, are they orthologs? What about 75%? If one group says > 80% means yes, and the other says > 75% means yes, then you have disagreements, but it's because they are using different cutoffs, and nobody can say for sure which one is right.