Question: How to : Preprocessing of Gene ST array with Oligo
0
3.5 years ago by
giroudpaul40
France
giroudpaul40 wrote:

Dear Bioconductor member,

I am quite a self-learning debutant with microarray analysis, but I managed to perform simple Quality Control, Preprocessing and DE analysis using affy and limma packages

But now, I am starting to work with Gene ST array and I will soon have to work with Raw data from HTA array.

So I tried to convert myself to the oligo package, however, the documentation for this package is directed to the HGU95, and it is not easily adaptable to the Gene ST for a neophyte like me.

So my question for you is : Do you know of more recent information about the use of the oligo package with recent WT microarrays.

I also have a few questions that arise for my first tries with oligo

1. On GEO, they had, in my case, .CEL with .rma-gene-full.chp files. Are the .chp important ?
2. After reading the CEL files, I got a GeneFeatureSet (called data). Why exprs(data) doesn't have probes names as row names ? Is it normal ? Is it a problem ? How to fix it ?
3. Why is it that :
> length(featureNames(data))
[1] 2598544
> length(probeNames(data))
[1] 1025088
> data.rma = rma(data)
> length(featureNames(data.rma))
[1] 53617
> data.rma = rma(data, target="probeset")
> length(featureNames(data.rma))
[1] 352859
4. After I perform rma, I get row names for exprs(data.rma), but those are just numbers from 16650001 to 17127721 (not linear though)
5. Oligo found the annotation himself pd.hugene.2.0.st), and attached it to the GeneFeatureSet, but what does that mean ? How to access theses annotations ?

I would be very thankful if you could explain some of these to me ! (or at least provide me relevant documentation)

oligo hugene20st • 741 views
modified 3.5 years ago by James W. MacDonald52k • written 3.5 years ago by giroudpaul40
Answer: How to : Preprocessing of Gene ST array with Oligo
2
3.5 years ago by
United States
James W. MacDonald52k wrote:
1. Not particularly, unless you want to use Affy processed data.
2. Why would that be a problem? Do you need to know something in particular about the probes that having a name would fulfill? In other words, the probes are uninteresting to the vast majority of people - it's the summarized probesets that matter.
3. featureNames dispatches on eSet, and simply tells you how many probes you have. Or put alternatively, featureNames on a GeneFeatureSet is an uninformative function. probeNames returns all the probeset names that each probe is mapped to, without subsetting out the duplicates. This is explained in ?probeNames. And if you summarize the probes at different levels, you get different numbers of probesets (the default is to summarize at the 'core' level, which is roughly transcript-level, and then you summarized at the 'probeset' level, which is at what Affy calls the 'probe set region', which is roughly exon-level, although there are often more than one PSR per exon).
4. Yes. Affy went away from things like 1007_s_at, which looked informative but were not, to numbers, which neither look informative nor are informative. Again, knowing that a probeset is called 1007_s_at doesn't mean anything, per se, or at least nothing more than knowing a probeset is called 123456. They are just IDs, after all.
5. No, oligo did not find the annotation to be the pd.hugene.2.0.st. That's the pdInfoPackage, and is used to do the mapping of probes to probesets. It's not what one would conventionally call an annotation package because it doesn't directly help you to map your probesets to the various functional data that we know about what a given probeset is intended to measure.

<shameless plug>

The easiest way I know to annotate your FeatureSet is to use the annotateEset function in the affycoretools package. Depending on what you use as annotation input (you can use the pdInfoPackage if you want, or the hugene20sttranscriptcluster.db package), it will automatically put the correct annotations in the featureData slot of your FeatureSet object, and when you use limma to do the analysis, the annotations will automatically get propagated to your topTable output.

</shameless plug>

Well, thank you, that's all clear now. I guess I got confused by the row numbers which I wasn't sure if it was IDs or not.

Thank you for the tip with annotateEset, seems easier than what I was doing before.