Question: when do linear models work?
15.6 years ago by
Arne.Muller@aventis.com • 620
Arne.Muller@aventis.com • 620 wrote:
Hello All, I've two fundamental problems with linear models (lm), maybe you can help me to clearify these issues: 1. Irrespective of how many factors you use in your expriment, the relationship is always assumed to be linear. If you've a response vector Y and vector X of indeppendent variables, the Y ~ X basically assumes a straight line (with some kind of slope). If you do say Y ~ X + Z then one can think of the lm as a *flat* surface. The same is true for higher dimensions (X ~ dose + time + batch + gender + ... ) This assumtion is realy dangerous I think, since many treatment/response relationships are not linear. For example think about an experiment: I've 5 doses 0.0mM, 0.10mM, 0.25mM, 0.5mM and 1.0mM of a drug with which cell cultures get treated. The 0.1mM dose causes hardly any change in gene expression, whereas there's a big difference in gene expression at 0.25mM. Then at 0.5mM and 1.0mM the reponse is not much stronger than at 0.25mM. If one just looks at a single gene, then expression of this gene goes up quite strongly from 0.1mM to 0.25mM, and then expression flattens out for the higher doses. The response reaches saturation. Other resposnes are more like a logistic curve. This is a typical scenario. The problem is that many genes within one experiment behave like described above, otheres change linear others exponetial ... Could I still use lm for this kind of experiment? Would I've to decide on a gene by gene basis? 2. Some of the factors such as treament (T) for an experiment can only take say 2 distinct values: treated (t) and untreated (ut). Does a model such as Y ~ T make any sense in this case? Doesn't this assume a linear relationship between just 2 "clouds" of data (assume there are many samples for each factor level)? Even if one can clearly distinguish between t and ut - assuming a straight line may wrong. This is like drawing a straight line between two points. Just like in my example above with the different doses, you may have already reached some kind of saturation. Using such a model for prediction would then give wrong results. However, if one just wants to distinguish between t and ut, would the lm be a valid method? I'm reading some "beginners" literature about lm's, and I'm just trying to understand what's going on ... . Maybe you could comment on this. I'd be very interested in any explanation or clearification. kind regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com
ADD COMMENT • link •