Use of org.Hs.eg.db and TxDb.Hsapiens.UCSC.hg19.knownGene
1
0
Entering edit mode
Lna • 0
@lna-10651
Last seen 4.3 years ago

Hi,

I was trying to make a list of SNPs and names of genes they are related to. So I used the VariantAnnotation package

locateVariants(target, TxDb.Hsapiens.UCSC.hg19.knownGene, AllVariants())

and got a list of the respective geneids. As far as I understood VariantAnnotation gets the geneids from the TxDb.Hsapiens.UCSC.hg19.knownGene package and these are ENTREZIDs, which can directly be used as keys by the org.Hs.eg.db package. When I do this,

select(org.Hs.eg.db,keys=gid, columns=c("GENENAME"),keytype="ENTREZID")

it works for some of the entries, then I obtain the error:

Fehler in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

I checked the geneid causing the error on the ncbi page and found that the id has been replaced by another one. So it seems TxDb.Hsapiens.UCSC.hg19.knownGene is providing an outdated geneid org.Hs.eg.db cannot deal with. I checked package version of TxDb.Hsapiens.UCSC.hg19.knownGene, it should be the latest version.

Now my question: Am I doing anything wrong or is this an inconsistency of the two packages I have to deal with? Is there a simple solution to solve this problem?

Thanks for any help!

TxDb.Hsapiens.UCSC.hg19.knownGene org.hs.eg.db • 1.5k views
1
Entering edit mode
@vincent-j-carey-jr-4
Last seen 1 day ago
United States

You can get information about the sources of the annotation resources by mentioning them.

> TxDb.Hsapiens.UCSC.hg19.knownGene

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: hg19
# Organism: Homo sapiens
# Taxonomy ID: 9606
# UCSC Table: knownGene
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Entrez Gene ID
# Full dataset: yes
# miRBase build ID: GRCh37
# transcript_nrow: 82960
# exon_nrow: 289969
# cds_nrow: 237533
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-10-07 18:11:28 +0000 (Wed, 07 Oct 2015)

> org.Hs.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: 2016-Sep26
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606

I am not sure mine are up to date, but in any case there is no guarantee that the two references are fully consistent -- one is made at UCSC and one at NCBI.  You can avoid the error by checking for the existence of your gid elements among the keys() result for the resource you are querying, and removing those that cannot be resolved.  Note that it will not fail if there is at least one valid key supplied:

> select(org.Hs.eg.db, c("1", "8"), columns="GENENAME", keytype="ENTREZID")
'select()' returned 1:1 mapping between keys and columns
ENTREZID               GENENAME
1        1 alpha-1-B glycoprotein
2        8                   <NA>