Search
Question: Use of org.Hs.eg.db and TxDb.Hsapiens.UCSC.hg19.knownGene
0
gravatar for Lna
12 months ago by
Lna0
Lna0 wrote:

Hi,

I was trying to make a list of SNPs and names of genes they are related to. So I used the VariantAnnotation package 

locateVariants(target, TxDb.Hsapiens.UCSC.hg19.knownGene, AllVariants())

and got a list of the respective geneids. As far as I understood VariantAnnotation gets the geneids from the TxDb.Hsapiens.UCSC.hg19.knownGene package and these are ENTREZIDs, which can directly be used as keys by the org.Hs.eg.db package. When I do this,

select(org.Hs.eg.db,keys=gid, columns=c("GENENAME"),keytype="ENTREZID")

it works for some of the entries, then I obtain the error:

Fehler in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

I checked the geneid causing the error on the ncbi page and found that the id has been replaced by another one. So it seems TxDb.Hsapiens.UCSC.hg19.knownGene is providing an outdated geneid org.Hs.eg.db cannot deal with. I checked package version of TxDb.Hsapiens.UCSC.hg19.knownGene, it should be the latest version.

Now my question: Am I doing anything wrong or is this an inconsistency of the two packages I have to deal with? Is there a simple solution to solve this problem?

Thanks for any help!

ADD COMMENTlink modified 12 months ago by Vincent J. Carey, Jr.6.2k • written 12 months ago by Lna0
1
gravatar for Vincent J. Carey, Jr.
12 months ago by
United States
Vincent J. Carey, Jr.6.2k wrote:

You can get information about the sources of the annotation resources by mentioning them.

> TxDb.Hsapiens.UCSC.hg19.knownGene

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: hg19
# Organism: Homo sapiens
# Taxonomy ID: 9606
# UCSC Table: knownGene
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Entrez Gene ID
# Full dataset: yes
# miRBase build ID: GRCh37
# transcript_nrow: 82960
# exon_nrow: 289969
# cds_nrow: 237533
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-10-07 18:11:28 +0000 (Wed, 07 Oct 2015)

 

> org.Hs.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: 2016-Sep26
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606

I am not sure mine are up to date, but in any case there is no guarantee that the two references are fully consistent -- one is made at UCSC and one at NCBI.  You can avoid the error by checking for the existence of your gid elements among the keys() result for the resource you are querying, and removing those that cannot be resolved.  Note that it will not fail if there is at least one valid key supplied:

> select(org.Hs.eg.db, c("1", "8"), columns="GENENAME", keytype="ENTREZID")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID               GENENAME
1        1 alpha-1-B glycoprotein
2        8                   <NA>

 

ADD COMMENTlink written 12 months ago by Vincent J. Carey, Jr.6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 109 users visited in the last hour