robust linear model in anovas
0
0
Entering edit mode
@arnemulleraventiscom-466
Last seen 9.6 years ago
Hello, I'm trying to analyze some toxicology data, and I got some problems with lm and rlm for which I'm looking for help. The experiment is a 3 factor design: - the value (the y vector) is a response for a treatment with a drug - a dose factor with 5 levles (a drug in 5 concentrations) - a time factor with 2 levels (time of treatment) - a batch factor with 3 levels corresponding to the laboratories that analyzed the data Within each dose, time and batch I've 3 "technical" replicates. I'm looking for genes for which the expression is altered mainly by the dose at any of the time points. Ideally I'd just like to merge the 3 replicates produced by each lab into one group with then 9 replicates (i.e. omitting the batch factor). Unfortunately the batch factor is very strong and introduces quite a large variance within each time and dose level. I know that sometimes one laboratory is quite a strong outlier, but the "extreme" lab changes from gene to gene! When looking at a stripchart for the dose (within one timepoint) and merged batches the datapoints are scattered, and often 2 or 3 of the replicates from one lab are >2 standard deviations away from the mean. I'm doing a simple lm to quantify the main effects (creating an lm for each gene). fit <- lm(value ~ dose + time + batch, d) > summary(fit)$coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 6.772575997 0.04040316 167.6249085 2.174127e-99 dose010mM -0.038755967 0.04435994 -0.8736704 3.850501e-01 dose025mM -0.028481675 0.04598919 -0.6193123 5.375628e-01 dose050mM -0.069415506 0.05087185 -1.3645171 1.764316e-01 dose100mM 0.008306756 0.04435994 0.1872580 8.519573e-01 time24h 0.015798762 0.02976119 0.5308511 5.970699e-01 batchOLD 0.134205422 0.03534714 3.7967832 2.930010e-04 batchPRG 0.040506343 0.03837539 1.0555291 2.945274e-01 > anova(fit) Analysis of Variance Table Response: value Df Sum Sq Mean Sq F value Pr(>F) dose 4 0.06086 0.01521 0.8180 0.517660 time 1 0.00524 0.00524 0.2818 0.597070 batch 2 0.27889 0.13944 7.4968 0.001068 ** Residuals 76 1.41362 0.01860 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > anova(rfit) Analysis of Variance Table To minumize the batch effect, e.g. to find true influences of the dose I thought it may be a good idea to exclude obvious outliers. I've done this by excludung data point >2 SD from the mean within a dose at a timepoint. I guess that's not realy elegant ... . Would it be apporpiate to use a robust lm (rlm or lqs) to "downgrade" the outliers in importance? Could a robutsly fitted linear model be used for an anova, e.g. do weight the data within the groups? E.g. rfit <- rlm(value ~ dose + time + batch, d) Why are there no p-values given for the t-statistics? > summary(rfit)$coefficients Value Std. Error t value (Intercept) 6.781613686 0.04215968 160.8554538 dose010mM -0.038060320 0.04628847 -0.8222418 dose025mM -0.042400425 0.04798856 -0.8835528 dose050mM -0.078787526 0.05308349 -1.4842192 dose100mM 0.008569384 0.04628847 0.1851300 time24h 0.005742874 0.03105505 0.1849256 batchOLD 0.140608264 0.03688384 3.8121911 batchPRG 0.031820089 0.04004375 0.7946331 ... and no p-values for the F-stats/F-values either ... > anova(rfit) Analysis of Variance Table Response: value Df Sum Sq Mean Sq F value Pr(>F) dose 4 0.07794 0.01949 time 1 0.00165 0.00165 batch 2 0.29484 0.14742 Residuals 1.42161 lqs doesn't have any implementation for anova, is it theoretically not possible to use the model for an anova? As an alternative I'm thinking about using the "weights" option for lm. Maybe the distance from the mean in units of standard deviations w <- abs(mean(i) - i) / sd(i) w <- 1/w # to give small weight to outliers I'd be happy to discuss these options with you. kind regards + thanks a lot for your comments, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com
DOSE DOSE • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6