Question

org.Hs.eg deprecation and txDb usage

0

Entering edit mode

karl.stamm ▴ 10

@karlstamm-7254

Last seen 2.1 years ago

United States

Some developers have run into a new WARNING on the automatic package builder (bioc devel 3.1), specifically that org.Hs.egCHR is deprecated, and we should look to some other packages for transcript annotation.

A question was raised on the devel mailing list, How should we make these changes? They redirected us to this support site, but I don't see any discussion of the issue here, and that's the point of this new post.

I import org.Hs.eg.db to package rgsepd to make hashtables of lookups between RefSeq NM_## ids to their associated Entrez geneID and HGNC name. I don't directly use any of the sub-packages, org.Hs.egCHR , org.Hs.egCHRLENGTHS , org.Hs.egCHRLOC , org.Hs.egCHRLOCEND which are deprecated and causing the warning.

I just need an ID conversion table, and when looking at things like TxDb.Hsapiens.UCSC.hg19.knownGene; no information is given on usage. The man pages are empty at http://www.bioconductor.org/packages/devel/data/annotation/html/TxDb.Hsapiens.UCSC.hg19.knownGene.html

So I need to ask the community how do we use these TxDBs, what is the minimal data I can IMPORT to my package for a comprehensive ID conversion? I do mRNA and ncRNA, so I might need TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts as well, but again the man pages are empty!

I know we can use bioMart for some lookups, but as I do a lot of repeated querying, an offline database seems the more efficient solution.

org.hs.eg.db annotation • 1.3k views

ADD COMMENT • link updated 9.3 years ago by Marc Carlson ★ 7.2k • written 9.3 years ago by karl.stamm ▴ 10

score 0 · Answer 1 · 2015-01-16

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

If you don't use any of those tables, then this is just a warning that you can ignore. You can still use org.Hs.eg.db to do the queries that you are making.

ADD COMMENT • link 9.3 years ago James W. MacDonald 65k

score 0 · Answer 2 · 2015-01-16

If you are curious about TxDb packages, or about annotation packages in general, I would start here:

http://www.bioconductor.org/help/workflows/annotation/annotation/

The more direct answer is that TxDb packages are for storing transcriptomes. So they are a natural place to store information about where a gene, a transcript or an exon starts and ends. In contrast the OrgDb packages like org.Hs.eg.db contain information about genes, but where in the genome these genes will be will depend on which genome things were based on...

In this project there are many people currently using a range of different genome builds for their work, so we don't want to choose an arbitrary 'winner' (by picking a genome to use with the org.Hs.eg.db package). So by referring people to an Transcriptome package that is already designed to be specific to a particular genome build, we are hoping to make things both more transparent and useful to a greater range of people.

Hope this helps,

Marc