missing gene_ids in org.Mm.egENSEMBL
1
0
Entering edit mode
Gregulator ▴ 30
@gregulator-9309
Last seen 4.4 years ago
Australia

I'm have been trying to annotate some RNAseq data using the org.Mm.eg.db. The count matrix I was sent by collaborators has ENSEMBL gene IDs. However, I have been having a problem with missing gene ids in the egENSEMBL table when I try to annotate. For example, one gene I am interested in, H19, has the gene_id 14955, but this id does not seem to be present in egENSEMBL. On the other hand 14955 is present in egSYMBOL. Is there something basic I am missing? Is there a different table I should be using?

 

Thank you,

Greg

 

org.mm.eg.db annotation • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

It's not clear what you are after. If you search ensembl.org for 14955, you get a page that indicates there is no Ensembl ID for this gene, so I am not sure how you even had this gene in your set of Ensembl gene IDs. Or are you saying that you expect H19 to be in your list of Ensembl genes and it's not there? If so, there's your answer.

Anyway, there are any number of genes that can be found in one or more of NCBI's databases that cannot be found in EBI's databases, and vice versa. This is particularly true for non-coding RNA.
 

ADD COMMENT
0
Entering edit mode

Sorry. I was realize I was very confusing. I'll try to clarify using the H19 as an example. When I search for H19 on the Ensembl website I find that H19’s Ensembl gene ID is ENSMUSG00000000031 and the Entrez gene ID as 14955. When I search through the egENSEMBL table, the ENSEMBL gene ID ENSMUSG00000000031 is not present. However, when I search through the egSYMBOL table, I find that H19 is present and has the Entrez gene ID 14955. See below for the exact commands I used to search through the tables

> library(org.Mm.eg.db)

> egENSEMBL <- toTable(org.Mm.egENSEMBL)

Then I wrote this table to a text file and searched for ENSMUSG00000000031.

gene_id

ensembl_id

 

14679

ENSMUSG00000000001

54192

ENSMUSG00000000003

12544

ENSMUSG00000000028

107815

ENSMUSG00000000037

11818

ENSMUSG00000000049

67608

ENSMUSG00000000056

As you can see there is no entry for H19 in this table. However, when I search through the egSYMBOL table I find there is an entry for H19

> egSYMBOL <- toTable(org.Mm.egSYMBOL)

Then I wrote this table to a text file and searched for 14955.

gene_id

symbol

14944

Gzmg

14945

Gzmk

14950

H13

14955

H19

14957

Hist1h1d

14958

H1f0

14960

H2-Aa

So my question is, why is the gene entry for H19 missing from the egENSEMBL table? Have I done something wrong?

ADD REPLY
0
Entering edit mode

No, it doesn't say that 14955 is the matching gene. It says something else:

Overlapping RefSeq Gene ID 14955 matches but different biotype of misc_RNA

So you are saying 'these things are the same', and both Ensembl and NCBI are saying, 'well, not really'. So this gets back to what the org.Xx.eg.db packages are; simply a reformulation of data from NCBI, without interpretation on our part, and in particular based on mappings, starting with NCBI's Gene database. If EBI and NCBI say that the gene is in the same place, but is not the same thing, exactly, then we won't map 14955 to ENSMUSG00000000031, because NCBI doesn't.

And no, you haven't done anything wrong. Like I said before, when you have two different groups doing essentially the same thing, there are bound to be things that are not completely consistent between the two. And if you look at things from Ensembl's standpoint, they agree to disagree as well:

> getBM(c("ensembl_gene_id","mgi_symbol", "entrezgene"), "ensembl_gene_id", "ENSMUSG00000000031", mart)
     ensembl_gene_id mgi_symbol entrezgene
1 ENSMUSG00000000031        H19         NA
ADD REPLY
0
Entering edit mode

Fair enough. Thank you very much for your help

ADD REPLY

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6