MIssing ensemble orthologues in biomart
1
1
Entering edit mode
mhem ▴ 10
@mhem-19823
Last seen 8 hours ago
Germany

I would like to convert murine ensembl gene ids to human ensemble gene ids using biomaRt.

library(biomaRt)
mart1 <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mart2 <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
genes.ensembl.biomart <- getLDS(attributes = c("ensembl_gene_id"), filters = "ensembl_gene_id", values = genes.ensembl.murine, mart = mart1, attributesL = c("ensembl_gene_id"), martL = mart2)

To keep the order I used match.

genes.ensembl <- data.frame (murine_ensembl = genes.ensembl.murine)
genes.ensembl$human_ensembl <- genes.ensembl.biomart[match(genes.ensembl[,1], genes.ensembl.biomart[,1]),2]

genes.ensembl.murine is a vector of length 14040.

     head(genes.ensembl.murine, 20)
 [1] ENSMUSG00000025902 ENSMUSG00000033845 ENSMUSG00000025903 ENSMUSG00000033813
 [5] ENSMUSG00000033793 ENSMUSG00000025907 ENSMUSG00000051285 ENSMUSG00000061024
 [9] ENSMUSG00000025911 ENSMUSG00000045210 ENSMUSG00000025915 ENSMUSG00000098234
[13] ENSMUSG00000025917 ENSMUSG00000056763 ENSMUSG00000067851 ENSMUSG00000048960
[17] ENSMUSG00000016918 ENSMUSG00000005886 ENSMUSG00000025935 ENSMUSG00000025937

The resulting data frame has 1396 missing values.

head(genes.ensembl[whichis.na(genes.ensembl[2])),], 10)
        murine_ensembl human_ensembl
12  ENSMUSG00000098234          <NA>
24  ENSMUSG00000043716          <NA>
43  ENSMUSG00000026064          <NA>
82  ENSMUSG00000073702          <NA>
85  ENSMUSG00000091937          <NA>
133 ENSMUSG00000025980          <NA>
134 ENSMUSG00000073676          <NA>
137 ENSMUSG00000097649          <NA>
146 ENSMUSG00000026035          <NA>
156 ENSMUSG00000097573          <NA>

Ensembl says ENSMUSG00000098234 is Snhg6 (http://www.ensembl.org/Musmusculus/Gene/Summary?g=ENSMUSG00000098234;r=1:9941959-9944118) , and the human orthologue is ENSG00000245910 (http://www.ensembl.org/Homosapiens/Gene/Summary?g=ENSG00000245910;r=8:66921684-66926398). However, using biomart on ensembl.org also doesn't find the human orthologue for ENSMUSG00000098234 .

Can anybody help me to convert the missing 1396 genes? Is it a problem with biomaRt or with ensembl.org?

Thank you very much. Mischko

biomaRt ensemble • 1.0k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

When you go to either of the pages you link, while they do show the same gene symbol, they do not have a link for any orthologs under the comparative genomics section, indicating that EBI/EMBL do not think these two genes are orthologs.

As a counter example, consider Sox17, which lists 24 primate orthologs, including SOX17.

Without looking at all 1396 genes, I cannot say if that follows for all of them, but in general I haven't ever found different results when going to ensembl.org than what I already got from the Biomart server, which makes sense, given that they are based on the same underlying database.

ADD COMMENT
0
Entering edit mode

Thanks James, that makes senses. So it s rather a biological than a technical issue. So how do you proceed if EMBL do not find orthologue genes ids. Is there any other more comprehensive database or do you just accept the 10% drop outs?

ADD REPLY
1
Entering edit mode

I don't know - that is a question that you will have to answer yourself. I don't know why EBI doesn't think those are orthologs, so without knowing that, how could I say that they 'do not find orthologue gene ids'? Maybe they do find them all, and there is a reason to think that those 1400 genes have no human orthologs. Or maybe they are not doing a good job and somebody else has better data.

But answering those questions requires a much deeper knowledge of the algorithm that EBI uses to define orthologs, and I have at best a superficial knowledge, so would be loath to say that what EBI is doing is not correct.

ADD REPLY

Login before adding your answer.

Traffic: 444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6