hugene10sttranscriptcluster.db missing annotation compare to hgu133plus2.db
1
0
Entering edit mode
Stane ▴ 40
@stane-10974
Last seen 6.3 years ago

Hello, 

I have recently been working with annotation for micro-array Affy platform GPL6244 using package hugene10sttranscriptcluster.db to retrieve symbols.

I notice that there are missing symbols compare to the manufacturer annotation file on NCBI with the crappy // in the fields. But if I intersect the difference NCBI/hugen10db with hgu133plus2.db then I found 397 symbol which means that some missing symbol in hugene10, present in NCBI annotation file, are also present in hgu133plus2.db

What confuses me is that I thought that the Bioconductor annotation packages were made on the same base then froze for a few months and therefore were in sync but apparently not. 

annotation affy • 1.1k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

The annotation packages are generated every release, and are then 'frozen' until the next release. But other than the fact that they are both intended to measure human samples (and are made by the same company), there is no reason to expect the HuGene 1.0 ST and HG-U133Plus2 arrays to have the same set of genes (or symbols).

Those arrays were designed five years apart (2001 for the U133Plus2 and 2006 for the HuGene), and are based on different gene databases (UniGene vs a combination of RefSeq and GenBank). I would expect a large degree of overlap, but not complete consistency.

In addition, all of the Bioconductor annotation packages are based on the idea that the Entrez Gene ID is the 'central gene ID'. We simply take the accession numbers (usually a combination of RefSeq and GenBank IDs) that Affy provides in the annotation file you reference, map those to Entrez Gene IDs, and then all other mappings (e.g. to HUGO symbols) are based on the mapping of Entrez Gene IDs onward. In fact, the chip-specific annotation packages like the hugene10sttranscriptcluster.db package don't have anything in them except for a probeset -> Entrez Gene ID mapping, and rely on the org.Hs.eg.db package to do all other mappings.

So there may well be instances where Affy says a given probeset maps to a particular gene symbol, but if NCBI doesn't agree, our annotation packages won't provide that mapping.

ADD COMMENT

Login before adding your answer.

Traffic: 875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6