Annotation with org.Hs.eg.db Genome Build 38
1
0
Entering edit mode
Bine ▴ 20
@bine-23912
Last seen 22 minutes ago
UK

Dear all,

I would be really happy if you could help me here.

I am fetching the annotation for my Genes via org.Hs.eg.db , which should use Version Build 38.

Now what happens is that for some genes I get the annotation and for some I dont and it doesnt make sense why it sometimes gets the annotation and sometimes not. A manual search on https://www.ensembl.org/index.html shows all the genes.

Example:

ENSG00000217120 - wont find gene annotation (but manual search finds it on Ensembl)

ENSG00000183463 - URAD (finds gene annotation)

Anyone has any idea what it could be?

This is how I fetch the annotation:

mapIds(org.Hs.eg.db, keys=ens.str, column="SYMBOL", keytype="ENSEMBL", multiVals="first")

Thank you, Bine

AnnotationDbi org.Hs.eg.db • 808 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

First, an OrgDb isn't tied to a particular genome build, because none of the information in that package is intended to be tied to any genomic position. There are (still) genomic positions in those objects, but they are meant to have been removed, but that has yet to occur.

Second, the name of the OrgDb is intended to inform you of the provenance of those data. So org.Hs.eg.db is supposed to inform you that it's an OrgDb for Homo sapiens, based on Entrez Gene (what NCBI Gene IDs used to be called). And the last part is what matters here. The central ID for these packages is the NCBI Gene ID, and all mappings are based on those IDs. So if you ask for the HUGO symbol for an Ensembl ID, what ends up happening is you first map the Ensembl ID to its corresponding NCBI Gene ID, and then the Gene ID is mapped to the correct HUGO symbol.

First try the OrgDb

> library(org.Hs.eg.db)

> select(org.Hs.eg.db, "ENSG00000217120", c("SYMBOL","ENTREZID"), "ENSEMBL")
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.


As you already know, no mappings there. Let's try biomaRt

> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", mirror = "useast")
> getBM(c("hgnc_symbol","entrezgene_id","ensembl_gene_id"), "ensembl_gene_id", "ENSG00000217120", mart)
hgnc_symbol entrezgene_id ensembl_gene_id
1          NA            NA ENSG00000217120


So you can see there is no NCBI Gene ID that corresponds to this Ensembl ID. But what about the HUGO symbol?

If you go to Ensembl, it appears there is a gene symbol (either Z98755.1, or CLPX) but if you thought that, you would be wrong. There are 'gene symbols' that people make up and stuff, but they aren't real gene symbols! Those come from HUGO, and according to that resource, there isn't a symbol for this pseudogene. Probably because of the pseudo part I would imagine.

0
Entering edit mode

Great, thank you so much, this makes a lot of sense. So has the name "pseudogene" something to do with that it has no "real symbol"?

1
Entering edit mode

Of course. A pseudogene is just a section of the genome that resembles a real gene. Hence the pseudo part. Why bother giving a gene symbol to a thing that isn't really thought of as being a real gene?

0
Entering edit mode

Great thanks!