Entering edit mode
Arne.Muller@aventis.com
▴
620
@arnemulleraventiscom-466
Last seen 10.6 years ago
Hello,
I've done some analysis on a factorial design based on the probe level
(affymetrix). The data was background corrected with RMA and
normalized via quantiles, but no summarization of the probes into
probe sets was done.
This means I've about 16 normalized intensity measures per gene and
condition. I include the probe as a factor, the model is simialr to
Wolfinger et al. (2002), except that I'm only considering fixed
effects (Wolfinger uses the array as a random effect):
Y = B+D+T+P + BP + BD + BT
B = Batch or Laboratory effect (3 levels)
D = Dose (5 levels)
T = Time (2 levels)
P = Probe (~16 levels)
Within each B/T/D I've 2 to 4 replications, and for dose level is
missing for one batch and it's time points.
I'm sure there are BD and BT interactions, but the probe may just
interact with the batch. I could actually run the full model, but it
takes a lot of processing time for 12,000 genes.
This model would actually run for each gene on the chip.
I found the R-squared values are quite good (>0.9), but the residuals
are note normal distributed. They've a sort of normal "core", but
there are many extreme values seen in a qqnorm plot which curves off
quite a lot already near the middle of the plot. Also a sharpiro or ks
test shows that the residuals for nearly all genes are not normal.
My question is whether some of you have observed this, too, and what
you've done about it ... . Does limma perform any model checking?
I've actually observed a similar non-normality for a 'by-gene' level
model (a model considering only the probe set measurement). In
addition the by gene level analysis has quite bad R-squared (most
genes are around ~0.7).
kind regards,
thanks + your comments,
Arne
--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com