Hello, A while ago I've posted "programming problem: running many ANOVAs" (I actually got a very sophisticated reply - too sophisticated for me :-( ...). Following this posting I came across another problem with linear models. I usually run a simple linear model including including all my factors (dose, time, batch) for each probeset on the array. I.e. I construct and run >12,000 linear models and anovas. The model could be: Value ~ batch + time, + dose I was thinking about running just a single linear model that includes the probes( actually the probes sets i.e. the genes) Value ~ gene + batch + time + dose + probe*batch + probe*time + probe*dose The gene (probeset) interacts with each main effect. the actual dataframe would look like this: Value batch time dose gene 5.225589 NEW 24h 000mM 100001_at 5.207835 NEW 24h 000mM 100001_at 4.138210 NEW 24h 000mM 100001_at 7.253535 OLD 24h 000mM 100001_at ... 4.018591 PRG 04h 025mM 100001_at 7.205778 PRG 04h 000mM 100001_at 8.191978 NEW 24h 000mM 100002_at I'm abolutely not sure about this. There are several problems: 1. What about degrees of freedom, they're huge? 2. Don't know how to interpret summary(fit) 3. Computitionally impossible (on my machine) ;-( ... I'm more interested in whether anybody here has already tried this seriously, i.e. worked on the statistical theory + biological interpretation. kind regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com
