I am analysing an Affymetrix Mogene 2.0 ST array, and I would like to collapse all cross-hybridizing probesets into a single transcript cluster. That is, reduce all rows with probesets mapping to different transcripts to a single row.
A first approach I thought was to annotate all probesets with the corresponding gene_id, using the probesets -> gene mapping that can be extracted from Biomart or the Affymetrix website. Then, for all probesets that are repeated, take only a random one or maybe the one with the maximum expression value.
However, I read about some CDF files which apparently have already done that, and they can be downloaded at: http://nmg-r.bioinformatics.nl/NuGO_R.html. I have never used them, and almost all tutorials are based on the old affymetrix 3' arrays, not the new ST ones.
To start with, I cannot figure out how to load such files in and use them to collapse my probeset-level expression matrix into a transcript-level expression matrix. btw, I am using 'oligo' for the processing and normalization. How could I do it? Can someone point me at a tutorial or documentation?