Question

what is fitProbeLevelModel()? doing

1

Entering edit mode

Nathaniel ▴ 20

@nathaniel-9283

Last seen 8.4 years ago

Denmark

I am doing a quality control for an Affymetrix Mouse Gene ST 2.0 microarray using RMA method. As far as I understand, this method performs a background correction, log2 transformation and then a quantile normalization. This ensures that the distribution of intensities between different microarrays is the same.

However, I have read that a step of summarisation follows the previous ones, where a linear model is fit to each probe set accounting for the probe affinity effect, the log scale expression level for the array and an error term. Also, as far as I understand, this is done with the fitProbeLevelModel() function of the oligo package.

When I read the documentation of the function, it states the following:

Fits robust Probe Level linear Models to all the (meta)probesets in an FeatureSet. This is carried out on a (meta)probeset by (meta)probeset basis.

fitProbeLevelModel(object, background=TRUE, normalize=TRUE, target="core", method="plm", verbose=TRUE, S4=TRUE, ...)

My questions are the following:

(1) What is the aim of this linear model for quality control?

(2) What is the probe affinity effect?

(3) The function says that it summarises (meta)probesets, what is the difference between probesets?

(4) The function by default normalises and background corrects. This step was not supposed to be done before? If I have already normalized using RMA, should I set the parameter normalize=TRUE?

affymetrix oligo microarray • 1.5k views

ADD COMMENT • link updated 8.4 years ago by James W. MacDonald 65k • written 8.4 years ago by Nathaniel ▴ 20

score 2 · Answer 1 · 2015-12-07

The primary use for this function (at least for me) is to generate standard errors from the model fit that you can then use to make NUSE plots, or images of the arrays, using either the standard errors or residuals. So to answer your questions:

Yes, it's for quality control. If you read all the way to the end of the help page, you will see that the example is simply used to generate NUSE, RLE and image plots.
Each probeset is based on multiple 25-mers, which all have different amounts of GC content. It turns out that the GC content of a 25-mer has a huge effect on how well that probe will bind to its complementary target (and how well it will bind to, like anything at all, given high enough GC content). Since the goal of summarizing the probesets is to estimate the underlying transcript abundance, rather than estimating how well a particular probe binds to stuff, the probe-specific binding is a 'nuisance variable' that we want to estimate and then promptly ignore. In other words, we care about this effect only because we know it exists and can bias our results, rather than wanting to know anything about it. If you want to know more, you can read the old RMA papers by Irizarry and Bolstad.
The Gene ST arrays can be summarized at two levels. You can summarize at the 'probeset' level, which is what Affy calls a 'probe set region' (PSR), which is roughly an exon-level summarization, or you can pile all the probes that interrogate a give transcript into one (meta) probeset ('core' level in this instance) that then gives you a transcript-level summary.
You don't run this function on summarized data. You run it on a GeneFeatureSet object, which contains raw probe-level data. You would usually set normalize to TRUE.