Question: robust linear model in anovas
0
15.5 years ago by
Arne.Muller@aventis.com620 wrote:
Hello, I'm trying to analyze some toxicology data, and I got some problems with lm and rlm for which I'm looking for help. The experiment is a 3 factor design: - the value (the y vector) is a response for a treatment with a drug - a dose factor with 5 levles (a drug in 5 concentrations) - a time factor with 2 levels (time of treatment) - a batch factor with 3 levels corresponding to the laboratories that analyzed the data Within each dose, time and batch I've 3 "technical" replicates. I'm looking for genes for which the expression is altered mainly by the dose at any of the time points. Ideally I'd just like to merge the 3 replicates produced by each lab into one group with then 9 replicates (i.e. omitting the batch factor). Unfortunately the batch factor is very strong and introduces quite a large variance within each time and dose level. I know that sometimes one laboratory is quite a strong outlier, but the "extreme" lab changes from gene to gene! When looking at a stripchart for the dose (within one timepoint) and merged batches the datapoints are scattered, and often 2 or 3 of the replicates from one lab are >2 standard deviations away from the mean. I'm doing a simple lm to quantify the main effects (creating an lm for each gene). fit <- lm(value ~ dose + time + batch, d) > summary(fit)$coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 6.772575997 0.04040316 167.6249085 2.174127e-99 dose010mM -0.038755967 0.04435994 -0.8736704 3.850501e-01 dose025mM -0.028481675 0.04598919 -0.6193123 5.375628e-01 dose050mM -0.069415506 0.05087185 -1.3645171 1.764316e-01 dose100mM 0.008306756 0.04435994 0.1872580 8.519573e-01 time24h 0.015798762 0.02976119 0.5308511 5.970699e-01 batchOLD 0.134205422 0.03534714 3.7967832 2.930010e-04 batchPRG 0.040506343 0.03837539 1.0555291 2.945274e-01 > anova(fit) Analysis of Variance Table Response: value Df Sum Sq Mean Sq F value Pr(>F) dose 4 0.06086 0.01521 0.8180 0.517660 time 1 0.00524 0.00524 0.2818 0.597070 batch 2 0.27889 0.13944 7.4968 0.001068 ** Residuals 76 1.41362 0.01860 --- Signif. codes: 0 ***' 0.001 **' 0.01 *' 0.05 .' 0.1 ` ' 1 > anova(rfit) Analysis of Variance Table To minumize the batch effect, e.g. to find true influences of the dose I thought it may be a good idea to exclude obvious outliers. I've done this by excludung data point >2 SD from the mean within a dose at a timepoint. I guess that's not realy elegant ... . Would it be apporpiate to use a robust lm (rlm or lqs) to "downgrade" the outliers in importance? Could a robutsly fitted linear model be used for an anova, e.g. do weight the data within the groups? E.g. rfit <- rlm(value ~ dose + time + batch, d) Why are there no p-values given for the t-statistics? > summary(rfit)$coefficients Value Std. Error t value (Intercept) 6.781613686 0.04215968 160.8554538 dose010mM -0.038060320 0.04628847 -0.8222418 dose025mM -0.042400425 0.04798856 -0.8835528 dose050mM -0.078787526 0.05308349 -1.4842192 dose100mM 0.008569384 0.04628847 0.1851300 time24h 0.005742874 0.03105505 0.1849256 batchOLD 0.140608264 0.03688384 3.8121911 batchPRG 0.031820089 0.04004375 0.7946331 ... and no p-values for the F-stats/F-values either ... > anova(rfit) Analysis of Variance Table Response: value Df Sum Sq Mean Sq F value Pr(>F) dose 4 0.07794 0.01949 time 1 0.00165 0.00165 batch 2 0.29484 0.14742 Residuals 1.42161 lqs doesn't have any implementation for anova, is it theoretically not possible to use the model for an anova? As an alternative I'm thinking about using the "weights" option for lm. Maybe the distance from the mean in units of standard deviations w <- abs(mean(i) - i) / sd(i) w <- 1/w # to give small weight to outliers I'd be happy to discuss these options with you. kind regards + thanks a lot for your comments, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com
dose • 604 views