How does BioC map from Probe ID to Entrez Gene?

0

Entering edit mode

Jacob Michaelson ▴ 320

@jacob-michaelson-1079

Last seen 11.3 years ago

Hi all, I've finished up with an analysis and in reviewing some of the annotations for gene symbols and RefSeqs, I've found some discrepancies that I don't know how to explain. The discrepancies are between Affy-supplied annotation (both both CSV and NetAffx) and BioC annotation. Let's take this probe for example: 1558097_at > sessionInfo() Version 2.3.0 (2006-04-24) i686-pc-linux-gnu attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: hgu133plus2 "1.12.0" > mget("1558097_at", hgu133plus2LOCUSID) $`1558097_at` [1] 8971 On NetAffx, the Entrez Gene ID shows 253143. I've got about 12 other probe sets that BioC and Affy disagree strongly on (symbols, RefSeqs, etc.). I suspect these can all be traced back the the Entrez ID disagreement. Since much of BioC's subsequent annotation is based on the Entrez Gene ID, the correct mapping from the Affy Probe ID to the Entrez gene ID is crucial. Which brings me to my question - how exactly does BioC map from Affy probe IDs to Entrez Gene IDs? There seems to be thorough documentation of how Entrez IDs are mapped to other annotations like Pubmed, GO, etc. but not much on how the Entrez Gene ID was mapped from the probe ID in the first place. My cursory "hand" examination tends to side with Affy, by BLAST-ing their probe sequences. Any enlightenment would be much appreciated. Thanks, Jake

Annotation GO probe affy Annotation GO probe affy • 1.6k views

ADD COMMENT • link updated 19.7 years ago by John Zhang ★ 2.9k • written 19.7 years ago by Jacob Michaelson ▴ 320

0

Entering edit mode

John Zhang ★ 2.9k

@john-zhang-6

Last seen 11.3 years ago

>Which brings me to my question - how exactly does BioC map from Affy >probe IDs to Entrez Gene IDs? There seems to be thorough documentation >of how Entrez IDs are mapped to other annotations like Pubmed, GO, etc. >but not much on how the Entrez Gene ID was mapped from the probe ID in >the first place. My cursory "hand" examination tends to side with Affy, >by BLAST-ing their probe sequences. BioC takes the GeneBank ids associated with the probes (provided by the manufacture) and then maps them to Entrez Gene ids using data from UniGene, Entrez Gene, and other available data sources we trust. The Entrez Gene id a probe is assigned to is determined by votes from all the sources used. If there is no agreement among the sources, we take the smallest Entrez Gene id. > >Any enlightenment would be much appreciated. > >Thanks, > >Jake > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084

ADD COMMENT • link 19.7 years ago John Zhang ★ 2.9k

Login before adding your answer.