Question: MIssing ensemble orthologues in biomart
1
8 months ago by
mhem0
mhem0 wrote:

I would like to convert murine ensembl gene ids to human ensemble gene ids using biomaRt.

library(biomaRt)
mart1 <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mart2 <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
genes.ensembl.biomart <- getLDS(attributes = c("ensembl_gene_id"), filters = "ensembl_gene_id", values = genes.ensembl.murine, mart = mart1, attributesL = c("ensembl_gene_id"), martL = mart2)


To keep the order I used match.

genes.ensembl <- data.frame (murine_ensembl = genes.ensembl.murine)
genes.ensembl\$human_ensembl <- genes.ensembl.biomart[match(genes.ensembl[,1], genes.ensembl.biomart[,1]),2]


genes.ensembl.murine is a vector of length 14040.

     head(genes.ensembl.murine, 20)
[1] ENSMUSG00000025902 ENSMUSG00000033845 ENSMUSG00000025903 ENSMUSG00000033813
[5] ENSMUSG00000033793 ENSMUSG00000025907 ENSMUSG00000051285 ENSMUSG00000061024
[9] ENSMUSG00000025911 ENSMUSG00000045210 ENSMUSG00000025915 ENSMUSG00000098234
[13] ENSMUSG00000025917 ENSMUSG00000056763 ENSMUSG00000067851 ENSMUSG00000048960
[17] ENSMUSG00000016918 ENSMUSG00000005886 ENSMUSG00000025935 ENSMUSG00000025937


The resulting data frame has 1396 missing values.

head(genes.ensembl[whichis.na(genes.ensembl[2])),], 10)
murine_ensembl human_ensembl
12  ENSMUSG00000098234          <NA>
24  ENSMUSG00000043716          <NA>
43  ENSMUSG00000026064          <NA>
82  ENSMUSG00000073702          <NA>
85  ENSMUSG00000091937          <NA>
133 ENSMUSG00000025980          <NA>
134 ENSMUSG00000073676          <NA>
137 ENSMUSG00000097649          <NA>
146 ENSMUSG00000026035          <NA>
156 ENSMUSG00000097573          <NA>


Ensembl says ENSMUSG00000098234 is Snhg6 (http://www.ensembl.org/Musmusculus/Gene/Summary?g=ENSMUSG00000098234;r=1:9941959-9944118) , and the human orthologue is ENSG00000245910 (http://www.ensembl.org/Homosapiens/Gene/Summary?g=ENSG00000245910;r=8:66921684-66926398). However, using biomart on ensembl.org also doesn't find the human orthologue for ENSMUSG00000098234 .

Can anybody help me to convert the missing 1396 genes? Is it a problem with biomaRt or with ensembl.org?

Thank you very much. Mischko

biomart ensemble • 169 views
modified 8 months ago by James W. MacDonald51k • written 8 months ago by mhem0
Answer: MIssing ensemble orthologues in biomart
2
8 months ago by
United States
James W. MacDonald51k wrote:

When you go to either of the pages you link, while they do show the same gene symbol, they do not have a link for any orthologs under the comparative genomics section, indicating that EBI/EMBL do not think these two genes are orthologs.

As a counter example, consider Sox17, which lists 24 primate orthologs, including SOX17.

Without looking at all 1396 genes, I cannot say if that follows for all of them, but in general I haven't ever found different results when going to ensembl.org than what I already got from the Biomart server, which makes sense, given that they are based on the same underlying database.

Thanks James, that makes senses. So it s rather a biological than a technical issue. So how do you proceed if EMBL do not find orthologue genes ids. Is there any other more comprehensive database or do you just accept the 10% drop outs?

1

I don't know - that is a question that you will have to answer yourself. I don't know why EBI doesn't think those are orthologs, so without knowing that, how could I say that they 'do not find orthologue gene ids'? Maybe they do find them all, and there is a reason to think that those 1400 genes have no human orthologs. Or maybe they are not doing a good job and somebody else has better data.

But answering those questions requires a much deeper knowledge of the algorithm that EBI uses to define orthologs, and I have at best a superficial knowledge, so would be loath to say that what EBI is doing is not correct.