Question

How is hgu133plus2.db database made out of the affymetrix probeset information

0

Entering edit mode

arkajyoti.bhattacharya • 0

@arkajyotibhattacharya-11976

Last seen 8.2 years ago

Hi,

I was trying to get the gene mapping for one of the datasets in Affymetrix. I found a dissimilarity between Affymetrix file and the database in hgu133plus2.

As per example, For the probeset "1553011_at", I have found a mapping to both these ENTREZ ID's 6872 /// 138474 in the file downloaded from AFfymetrix website. But in the hgu133plus2 database it is only mapped to 138474. Moreover, in all the cases where a probeset is mapped to multiple genes in the Affymetrix file, only one of them is chosen as mapped gene in the hgu133plus2. I wish to know in what basis the other genes are ommitted from mapping. Was there any algorithm to select the mapped gene among the multiple one's?

The link to the Affymetrix file:- http://www.affymetrix.com/Auth/analysis/downloads/na36/ivt/HG-U133_Plus_2.na36.annot.csv.zip

Regards,

Arkajyoti Bhattacharya

hgu133plus2.db • 1.2k views

ADD COMMENT • link updated 8.2 years ago by James W. MacDonald 68k • written 8.2 years ago by arkajyoti.bhattacharya • 0

score 0 · Answer 1 · 2016-12-05

The default way to map the Affy probeset IDs to Entrez Gene is to extract the 'Representative Public ID' from the csv file, along with the Entrez Gene ID. The Representative Public ID is either a GenBank or RefSeq ID, which in the case of this array is NM_153809.1. This is used to map the probeset ID to the Entrez Gene ID.

The Entrez Gene IDs are also parsed out of the annotation csv file, but they are only used to map probesets for which the primary mapping failed. In this case, NM_153809 maps to Entrez Gene ID 138474, so that's all you get from the ChipDb package.