ENSEMBL records not complete in org.Mm.eg.db?
1
0
Entering edit mode
Ming Wang • 0
@ming-wang-8281
Last seen 2.3 years ago
United States

When I convert gene_id from ENSEMBL to SYMBOL or ENTREZID using org.Mm.eg.db package, most of the genes failed.

So I checked the package org.Mm.eg.db:

It turns out that, a total of 71927 ENTREZID records, but only 33045 ENSEMBL records.

only 45.7% of the ENTREZID could be converted to ENSEMBL.

library(org.Mm.eg.db)
orgdb <- org.Mm.eg.db
g1 <- keys(orgdb, "ENSEMBL")
length(g1)

g2 <- keys(orgdb, "ENTREZID")
length(g2)

g2e <- mapIds(orgdb, keys = g2, column = "ENSEMBL", keytype =  "ENTREZID", multiVals = "first")
g2e <- g2e[! is.na(g2e)]
length(g2e)

length(g2e) / length(g2)
org.Mm.eg.db • 854 views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 1 day ago
Republic of Ireland

Hi,

You should not expect a complete mapping of IDs across different 'key types', with key types in this case representing different annotation databases such as Ensembl, MGI symbols, RefSeq / Entrez, VEGA, etc.

Each annotation database has different rules about what to include. There are many postings on this all across the World Wide Web, for example:

As the [I assume] analyst, you can set rules about what to do with these 'unmapped' IDs. Most will be predicted genes that were found to have negligible expression in some experiments. Keep in mind, also, that there are many thousands of processed and unprocessed pseudogenes in the genome.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6