Entering edit mode
Jacob Michaelson
▴
320
@jacob-michaelson-1079
Last seen 10.3 years ago
Hi all,
I've finished up with an analysis and in reviewing some of the
annotations for gene symbols and RefSeqs, I've found some
discrepancies
that I don't know how to explain. The discrepancies are between
Affy-supplied annotation (both both CSV and NetAffx) and BioC
annotation.
Let's take this probe for example:
1558097_at
> sessionInfo()
Version 2.3.0 (2006-04-24)
i686-pc-linux-gnu
attached base packages:
[1] "methods" "stats" "graphics" "grDevices" "utils"
"datasets"
[7] "base"
other attached packages:
hgu133plus2
"1.12.0"
> mget("1558097_at", hgu133plus2LOCUSID)
$`1558097_at`
[1] 8971
On NetAffx, the Entrez Gene ID shows 253143.
I've got about 12 other probe sets that BioC and Affy disagree
strongly
on (symbols, RefSeqs, etc.). I suspect these can all be traced back
the
the Entrez ID disagreement. Since much of BioC's subsequent
annotation
is based on the Entrez Gene ID, the correct mapping from the Affy
Probe
ID to the Entrez gene ID is crucial.
Which brings me to my question - how exactly does BioC map from Affy
probe IDs to Entrez Gene IDs? There seems to be thorough documentation
of how Entrez IDs are mapped to other annotations like Pubmed, GO,
etc.
but not much on how the Entrez Gene ID was mapped from the probe ID in
the first place. My cursory "hand" examination tends to side with
Affy,
by BLAST-ing their probe sequences.
Any enlightenment would be much appreciated.
Thanks,
Jake