Question: missing gene_ids in org.Mm.egENSEMBL
0
gravatar for Gregulator
4.0 years ago by
Gregulator30
Australia
Gregulator30 wrote:

I'm have been trying to annotate some RNAseq data using the org.Mm.eg.db. The count matrix I was sent by collaborators has ENSEMBL gene IDs. However, I have been having a problem with missing gene ids in the egENSEMBL table when I try to annotate. For example, one gene I am interested in, H19, has the gene_id 14955, but this id does not seem to be present in egENSEMBL. On the other hand 14955 is present in egSYMBOL. Is there something basic I am missing? Is there a different table I should be using?

 

Thank you,

Greg

 

annotation org.mm.eg.db • 607 views
ADD COMMENTlink modified 4.0 years ago by James W. MacDonald51k • written 4.0 years ago by Gregulator30
Answer: missing gene_ids in org.Mm.egENSEMBL
0
gravatar for James W. MacDonald
4.0 years ago by
United States
James W. MacDonald51k wrote:

It's not clear what you are after. If you search ensembl.org for 14955, you get a page that indicates there is no Ensembl ID for this gene, so I am not sure how you even had this gene in your set of Ensembl gene IDs. Or are you saying that you expect H19 to be in your list of Ensembl genes and it's not there? If so, there's your answer.

Anyway, there are any number of genes that can be found in one or more of NCBI's databases that cannot be found in EBI's databases, and vice versa. This is particularly true for non-coding RNA.
 

ADD COMMENTlink written 4.0 years ago by James W. MacDonald51k

Sorry. I was realize I was very confusing. I'll try to clarify using the H19 as an example. When I search for H19 on the Ensembl website I find that H19’s Ensembl gene ID is ENSMUSG00000000031 and the Entrez gene ID as 14955. When I search through the egENSEMBL table, the ENSEMBL gene ID ENSMUSG00000000031 is not present. However, when I search through the egSYMBOL table, I find that H19 is present and has the Entrez gene ID 14955. See below for the exact commands I used to search through the tables

> library(org.Mm.eg.db)

> egENSEMBL <- toTable(org.Mm.egENSEMBL)

Then I wrote this table to a text file and searched for ENSMUSG00000000031.

gene_id

ensembl_id

 

14679

ENSMUSG00000000001

54192

ENSMUSG00000000003

12544

ENSMUSG00000000028

107815

ENSMUSG00000000037

11818

ENSMUSG00000000049

67608

ENSMUSG00000000056

As you can see there is no entry for H19 in this table. However, when I search through the egSYMBOL table I find there is an entry for H19

> egSYMBOL <- toTable(org.Mm.egSYMBOL)

Then I wrote this table to a text file and searched for 14955.

gene_id

symbol

14944

Gzmg

14945

Gzmk

14950

H13

14955

H19

14957

Hist1h1d

14958

H1f0

14960

H2-Aa

So my question is, why is the gene entry for H19 missing from the egENSEMBL table? Have I done something wrong?

ADD REPLYlink written 4.0 years ago by Gregulator30

No, it doesn't say that 14955 is the matching gene. It says something else:

Overlapping RefSeq Gene ID 14955 matches but different biotype of misc_RNA

So you are saying 'these things are the same', and both Ensembl and NCBI are saying, 'well, not really'. So this gets back to what the org.Xx.eg.db packages are; simply a reformulation of data from NCBI, without interpretation on our part, and in particular based on mappings, starting with NCBI's Gene database. If EBI and NCBI say that the gene is in the same place, but is not the same thing, exactly, then we won't map 14955 to ENSMUSG00000000031, because NCBI doesn't.

And no, you haven't done anything wrong. Like I said before, when you have two different groups doing essentially the same thing, there are bound to be things that are not completely consistent between the two. And if you look at things from Ensembl's standpoint, they agree to disagree as well:

> getBM(c("ensembl_gene_id","mgi_symbol", "entrezgene"), "ensembl_gene_id", "ENSMUSG00000000031", mart)
     ensembl_gene_id mgi_symbol entrezgene
1 ENSMUSG00000000031        H19         NA
ADD REPLYlink written 4.0 years ago by James W. MacDonald51k

Fair enough. Thank you very much for your help

ADD REPLYlink written 4.0 years ago by Gregulator30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 403 users visited in the last hour