Question: MIssing ensemble orthologues in biomart
1
gravatar for mhem
9 weeks ago by
mhem0
mhem0 wrote:

I would like to convert murine ensembl gene ids to human ensemble gene ids using biomaRt.

library(biomaRt)
mart1 <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mart2 <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
genes.ensembl.biomart <- getLDS(attributes = c("ensembl_gene_id"), filters = "ensembl_gene_id", values = genes.ensembl.murine, mart = mart1, attributesL = c("ensembl_gene_id"), martL = mart2)

To keep the order I used match.

genes.ensembl <- data.frame (murine_ensembl = genes.ensembl.murine)
genes.ensembl$human_ensembl <- genes.ensembl.biomart[match(genes.ensembl[,1], genes.ensembl.biomart[,1]),2]

genes.ensembl.murine is a vector of length 14040.

     head(genes.ensembl.murine, 20)
 [1] ENSMUSG00000025902 ENSMUSG00000033845 ENSMUSG00000025903 ENSMUSG00000033813
 [5] ENSMUSG00000033793 ENSMUSG00000025907 ENSMUSG00000051285 ENSMUSG00000061024
 [9] ENSMUSG00000025911 ENSMUSG00000045210 ENSMUSG00000025915 ENSMUSG00000098234
[13] ENSMUSG00000025917 ENSMUSG00000056763 ENSMUSG00000067851 ENSMUSG00000048960
[17] ENSMUSG00000016918 ENSMUSG00000005886 ENSMUSG00000025935 ENSMUSG00000025937

The resulting data frame has 1396 missing values.

head(genes.ensembl[whichis.na(genes.ensembl[2])),], 10)
        murine_ensembl human_ensembl
12  ENSMUSG00000098234          <NA>
24  ENSMUSG00000043716          <NA>
43  ENSMUSG00000026064          <NA>
82  ENSMUSG00000073702          <NA>
85  ENSMUSG00000091937          <NA>
133 ENSMUSG00000025980          <NA>
134 ENSMUSG00000073676          <NA>
137 ENSMUSG00000097649          <NA>
146 ENSMUSG00000026035          <NA>
156 ENSMUSG00000097573          <NA>

Ensembl says ENSMUSG00000098234 is Snhg6 (http://www.ensembl.org/Musmusculus/Gene/Summary?g=ENSMUSG00000098234;r=1:9941959-9944118) , and the human orthologue is ENSG00000245910 (http://www.ensembl.org/Homosapiens/Gene/Summary?g=ENSG00000245910;r=8:66921684-66926398). However, using biomart on ensembl.org also doesn't find the human orthologue for ENSMUSG00000098234 .

Can anybody help me to convert the missing 1396 genes? Is it a problem with biomaRt or with ensembl.org?

Thank you very much. Mischko

biomart ensemble • 91 views
ADD COMMENTlink modified 9 weeks ago by James W. MacDonald49k • written 9 weeks ago by mhem0
Answer: MIssing ensemble orthologues in biomart
2
gravatar for James W. MacDonald
9 weeks ago by
United States
James W. MacDonald49k wrote:

When you go to either of the pages you link, while they do show the same gene symbol, they do not have a link for any orthologs under the comparative genomics section, indicating that EBI/EMBL do not think these two genes are orthologs.

As a counter example, consider Sox17, which lists 24 primate orthologs, including SOX17.

Without looking at all 1396 genes, I cannot say if that follows for all of them, but in general I haven't ever found different results when going to ensembl.org than what I already got from the Biomart server, which makes sense, given that they are based on the same underlying database.

ADD COMMENTlink written 9 weeks ago by James W. MacDonald49k

Thanks James, that makes senses. So it s rather a biological than a technical issue. So how do you proceed if EMBL do not find orthologue genes ids. Is there any other more comprehensive database or do you just accept the 10% drop outs?

ADD REPLYlink written 9 weeks ago by mhem0
1

I don't know - that is a question that you will have to answer yourself. I don't know why EBI doesn't think those are orthologs, so without knowing that, how could I say that they 'do not find orthologue gene ids'? Maybe they do find them all, and there is a reason to think that those 1400 genes have no human orthologs. Or maybe they are not doing a good job and somebody else has better data.

But answering those questions requires a much deeper knowledge of the algorithm that EBI uses to define orthologs, and I have at best a superficial knowledge, so would be loath to say that what EBI is doing is not correct.

ADD REPLYlink written 9 weeks ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 134 users visited in the last hour