org.Hs.eg deprecation and txDb usage
2
0
Entering edit mode
karl.stamm ▴ 10
@karlstamm-7254
Last seen 2.1 years ago
United States

Some developers have run into a new WARNING on the automatic package builder (bioc devel 3.1), specifically that org.Hs.egCHR is deprecated, and we should look to some other packages for transcript annotation. 

A question was raised on the devel mailing list, How should we make these changes? They redirected us to this support site, but I don't see any discussion of the issue here, and that's the point of this new post. 

I import org.Hs.eg.db to package rgsepd to make hashtables of lookups between RefSeq NM_## ids to their associated Entrez geneID and HGNC name.  I don't directly use any of the sub-packages,   org.Hs.egCHR  , org.Hs.egCHRLENGTHS  ,  org.Hs.egCHRLOC , org.Hs.egCHRLOCEND which are deprecated and causing the warning. 

I just need an ID conversion table, and when looking at things like TxDb.Hsapiens.UCSC.hg19.knownGene; no information is given on usage. The man pages are empty at http://www.bioconductor.org/packages/devel/data/annotation/html/TxDb.Hsapiens.UCSC.hg19.knownGene.html 

So I need to ask the community how do we use these TxDBs, what is the minimal data I can IMPORT to my package for a comprehensive ID conversion? I do mRNA and ncRNA, so I might need TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts as well, but again the man pages are empty!

I know we can use bioMart for some lookups, but as I do a lot of repeated querying, an offline database seems the more efficient solution. 

 

org.hs.eg.db annotation • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 minutes ago
United States

If you don't use any of those tables, then this is just a warning that you can ignore. You can still use org.Hs.eg.db to do the queries that you are making.

ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States

If you are curious about TxDb packages, or about annotation packages in general, I would start here:

http://www.bioconductor.org/help/workflows/annotation/annotation/

The more direct answer is that TxDb packages are for storing transcriptomes.  So they are a natural place to store information about where a gene, a transcript or an exon starts and ends.  In contrast the OrgDb packages like org.Hs.eg.db contain information about genes, but where in the genome these genes will be will depend on which genome things were based on... 

In this project there are many people currently using a range of different genome builds for their work, so we don't want to choose an arbitrary 'winner' (by picking a genome to use with the org.Hs.eg.db package).  So by referring people to an Transcriptome package that is already designed to be specific to a particular genome build, we are hoping to make things both more transparent and useful to a greater range of people.

 

Hope this helps,

 

 Marc

ADD COMMENT

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6