Entering edit mode
Paul Shannon
★
1.1k
@paul-shannon-578
Last seen 10.3 years ago
Here's an annotation question someone might be able to help me out
with. I'll be grateful.
Affymetrix describes their 'GeneChip Human Gene 1.0 ST Array':
Each of the 28,869 genes is represented on the array by approximately
26 probes spread across the full length of the gene, providing a more
complete and more accurate picture of gene expression than 3? based
expression array designs. ... The Gene 1.0 ST Array uses a subset of
probes from the Human Exon 1.0 ST Array and covers only well-annotated
content.
This sounds to me like affy started with sequence from exons of ~29k
genes and created probes.
But when I look at the bioc annotation for this chip
(hugene10stprobeset.db, hugene10sttranscriptcluster.db), I find that
about 7% of the probes are NOT annotated to geneIDs. The sibling
array, hugene10sttranscriptclusterENTREZID, though smaller, has a
higher proportion of unmapped probes.
library (hugene10stprobeset.db)
library (hugene10sttranscriptcluster.db)
bm = hugene10stprobesetENTREZID
length (keys (bm)) # 257022
count.mappedkeys (bm) # 238141
# unmapped: 18881
cm = hugene10sttranscriptclusterENTREZID
length (keys (cm)); # 33257
count.mappedkeys (cm) # 21787
The same proportions (unmapped/mapped seems to hold true for
hugene10stprobesetENSEMBL as well.
Can anyone suggest where I can get entrez geneID annotations for these
unmapped probes? Or otherwise clear up my confusion?
Thanks!
- Paul