So many unmapped probes
2
0
Entering edit mode
Ed Siefker ▴ 230
@ed-siefker-5136
Last seen 5 months ago
United States

I have microarray data downloaded from ArrayExpress.  The annotation is listed as "pd.hugene.1.0.st.v1".  I'm trying to annotate them with hugene10sttranscriptcluster.db.  My problem is that a large number of probes map to no symbol or refseq.  Is this normal?

```

> mydata.rma
ExpressionSet (storageMode: lockedEnvironment)
assayData: 33297 features, 14 samples
  element names: exprs
protocolData
  rowNames: GSM946485_48-2.CEL GSM946484_48-1.CEL ... GSM946472_0-1.CEL
    (14 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM946485_48-2.CEL GSM946484_48-1.CEL ... GSM946472_0-1.CEL
    (14 total)
  varLabels: Source.Name Comment..Sample_source_name. ...
    FactorValue..TIME. (27 total)
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.1.0.st.v1
> ids<-rownames(exprs(mydata.rma))
> length(ids)
[1] 33297
> symbols <- AnnotationDbi::mapIds(hugene10sttranscriptcluster.db, ids, "SYMBOL", "PROBEID")
'select()' returned 1:many mapping between keys and columns
> sumis.na(symbols))
[1] 11147
>
```
A third of these probes match no symbol.  That seems really high.  What's going on?  Is this the wrong .db package to use for this platform?

annotation hugene10sttranscriptcluster.db annotation • 819 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

Your counting is off, primarily because you are naively assuming that everything on an Affy array actually measures something that should have a gene symbol.

con <- db(pd.hugene.1.0.st.v1)

> ids <- dbGetQuery(con, "select transcript_cluster_id from featureSet where type='1';")[,1]
> length(ids)
[1] 253002

> sum(is.na(mapIds(hugene10sttranscriptcluster.db, as.character(ids), "ENTREZID","PROBEID")))
'select()' returned 1:many mapping between keys and columns
[1] 7567

Which means that about 4.4% of the 'main' probes have no Entrez Gene ID, and are probably some speculative content, or lincRNAs or whatnot.

ADD COMMENT

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6