Question

MIssing ensemble orthologues in biomart

1

Entering edit mode

mhem ▴ 10

@mhem-19823

Last seen 4 months ago

Germany

I would like to convert murine ensembl gene ids to human ensemble gene ids using biomaRt.

library(biomaRt)
mart1 <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mart2 <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
genes.ensembl.biomart <- getLDS(attributes = c("ensembl_gene_id"), filters = "ensembl_gene_id", values = genes.ensembl.murine, mart = mart1, attributesL = c("ensembl_gene_id"), martL = mart2)

To keep the order I used match.

genes.ensembl <- data.frame (murine_ensembl = genes.ensembl.murine)
genes.ensembl$human_ensembl <- genes.ensembl.biomart[match(genes.ensembl[,1], genes.ensembl.biomart[,1]),2]

genes.ensembl.murine is a vector of length 14040.

     head(genes.ensembl.murine, 20)
 [1] ENSMUSG00000025902 ENSMUSG00000033845 ENSMUSG00000025903 ENSMUSG00000033813
 [5] ENSMUSG00000033793 ENSMUSG00000025907 ENSMUSG00000051285 ENSMUSG00000061024
 [9] ENSMUSG00000025911 ENSMUSG00000045210 ENSMUSG00000025915 ENSMUSG00000098234
[13] ENSMUSG00000025917 ENSMUSG00000056763 ENSMUSG00000067851 ENSMUSG00000048960
[17] ENSMUSG00000016918 ENSMUSG00000005886 ENSMUSG00000025935 ENSMUSG00000025937

The resulting data frame has 1396 missing values.

head(genes.ensembl[whichis.na(genes.ensembl[2])),], 10)
        murine_ensembl human_ensembl
12  ENSMUSG00000098234          <NA>
24  ENSMUSG00000043716          <NA>
43  ENSMUSG00000026064          <NA>
82  ENSMUSG00000073702          <NA>
85  ENSMUSG00000091937          <NA>
133 ENSMUSG00000025980          <NA>
134 ENSMUSG00000073676          <NA>
137 ENSMUSG00000097649          <NA>
146 ENSMUSG00000026035          <NA>
156 ENSMUSG00000097573          <NA>

Ensembl says ENSMUSG00000098234 is Snhg6 (http://www.ensembl.org/Musmusculus/Gene/Summary?g=ENSMUSG00000098234;r=1:9941959-9944118) , and the human orthologue is ENSG00000245910 (http://www.ensembl.org/Homosapiens/Gene/Summary?g=ENSG00000245910;r=8:66921684-66926398). However, using biomart on ensembl.org also doesn't find the human orthologue for ENSMUSG00000098234 .

Can anybody help me to convert the missing 1396 genes? Is it a problem with biomaRt or with ensembl.org?

Thank you very much. Mischko

biomaRt ensemble • 1.1k views

ADD COMMENT • link updated 6.1 years ago by James W. MacDonald 68k • written 6.1 years ago by mhem ▴ 10

score 2 · Accepted Answer · 2019-02-11

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

When you go to either of the pages you link, while they do show the same gene symbol, they do not have a link for any orthologs under the comparative genomics section, indicating that EBI/EMBL do not think these two genes are orthologs.

As a counter example, consider Sox17, which lists 24 primate orthologs, including SOX17.

Without looking at all 1396 genes, I cannot say if that follows for all of them, but in general I haven't ever found different results when going to ensembl.org than what I already got from the Biomart server, which makes sense, given that they are based on the same underlying database.

ADD COMMENT • link 6.1 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James, that makes senses. So it s rather a biological than a technical issue. So how do you proceed if EMBL do not find orthologue genes ids. Is there any other more comprehensive database or do you just accept the 10% drop outs?

ADD REPLY • link 6.1 years ago mhem ▴ 10

1

Entering edit mode

I don't know - that is a question that you will have to answer yourself. I don't know why EBI doesn't think those are orthologs, so without knowing that, how could I say that they 'do not find orthologue gene ids'? Maybe they do find them all, and there is a reason to think that those 1400 genes have no human orthologs. Or maybe they are not doing a good job and somebody else has better data.

But answering those questions requires a much deeper knowledge of the algorithm that EBI uses to define orthologs, and I have at best a superficial knowledge, so would be loath to say that what EBI is doing is not correct.

ADD REPLY • link 6.1 years ago James W. MacDonald 68k