I am analysing an Affymetrix Mogene 2.0 ST array, and I would like to collapse all cross-hybridizing probesets into a single transcript cluster. That is, reduce all rows with probesets mapping to different transcripts to a single row.
A first approach I thought was to annotate all probesets with the corresponding gene_id, using the probesets -> gene mapping that can be extracted from Biomart or the Affymetrix website. Then, for all probesets that are repeated, take only a random one or maybe the one with the maximum expression value.
However, I read about some CDF files which apparently have already done that, and they can be downloaded at: http://nmg-r.bioinformatics.nl/NuGO_R.html. I have never used them, and almost all tutorials are based on the old affymetrix 3' arrays, not the new ST ones.
To start with, I cannot figure out how to load such files in and use them to collapse my probeset-level expression matrix into a transcript-level expression matrix. btw, I am using 'oligo' for the processing and normalization. How could I do it? Can someone point me at a tutorial or documentation?
Thanks.
Thanks a lot for the answer Guido.
after some googling I realized that someone had already made the CDF files to work with 'oligo': https://bioconductor.org/packages/release/data/annotation/html/pd.mogene.2.0.st.html
Thus, I performed the analysis as follows:
So now I have a matrix with normalized expression values for each probeset. Where can I find the transcript annotation for each of these probesets? I thought these information was contained in the CDF given by pd.mogene.2.0.st? How can I extract it?
Guido - You may already know this, or maybe not. Benilton somewhat silently added code to pdInfoBuilder last release that allows you to generate a pdInfoPackage based on the MBNI CDFs that Manhong produces, so you can now use oligo for those as well.
Thanks James for this info; I wasn't aware of that! However, I had a look at the vignette and help pages of PdInfoBuilder, but could not directly find more on this. I assume this is possible along the same line (s of code) you previously posted here: A: How to use brainarray custom cdf with oligo package? ??
Exactly. I should congratulate you on your ability to find things on this site. I can't find things to save my life, so when people come up with just the exact post I am thinking of (that I cannot find) I am simply amazed.
And that is what I mean by 'somewhat silently'. There are some additions to the man page for makePdInfoPackage and chipName, but otherwise not so much. I am intending to add something but haven't had the time.