I made a ExpressionSet from a Affy U133a spreadsheet data.
> testExp
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22277 features, 43 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM637758 GSM637759 ... GSM637800 (43 total)
varLabels: Group
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
I did not use the u133a.db from annotationDbi. Instead, I would like to annotate using my own annotation file ("anno")
> head(anno)
PROBEID ENSEMBL
1 209108_at ENSG00000000003
2 209109_s_at ENSG00000000003
3 220065_at ENSG00000000005
4 202673_at ENSG00000000419
5 205607_s_at ENSG00000000457
6 220840_s_at ENSG00000000457
> dim(anno)
[1] 37627 2
Gene numbers in ExpressionSet and annotation file are different. What I need is a final ExpressionSet with the only the valid ENSEMBL IDs (delete all affy Ids with no matching ENSEMBL ids from ExpressionSet)
Is it possible? I would appreciate the detailed protocol.
To remove rows from
anno
that do not map to an "ENSG", you can do something like:I suspect that is only half of the problem. You will likely still have the problem of a single probe_id mapping to multiple ENSGs. Unfortunately, that is not a simple problem to solve (you will need to make some arbitrary decisions).