Linear modeling for affy experiment

0

Entering edit mode

YUK FAI LEUNG ▴ 140

@yuk-fai-leung-605

Last seen 9.6 years ago

Hi there, I am going to do an affy experiment for the first time. I have a few questions about the linear model design for my experiment. I have a 3x2 factorial experiment. Three biological samples (wild type whole animal (WA), wild type tissue (WT), mutant tissue (MT)) and two time points (t1 and t2). The effects of interest are the mutant (M) and tissue (T) specific expression and their changes other time (Ti). I suppose the model should have the following equations (error term omitted) and I would have 3 affy biological replicates for each condition. WA.t1 = mu WT.t1 = mu + T MT.t1 = mu + T + M + T*M WA.t2 = mu + Ti WT.t2 = mu + T + Ti + T*Ti MT.t2 = mu + T + M + Ti + T*M + M*Ti + T*Ti + T*M*Ti My questions are: 1. How many degree of freedom do I have in the model? How do I calculate degree of freedom in linear model in general? For my case, is it 3 arrays * 6 conditions - 8 coefficients to be estimated = 10 degree of freedom? 2. If I want to increase my degree of freedom, is it true that I can do it by increasing my replicate? If it is true, is there a difference between repeating a sample with more coeffcients (e.g. MT.t2) and a sample with less coefficients (e.g. WA.t1)? It seems to me having a repeat with more coefficients is better off, but I don't know have to stay it out statistically. 3. What is the formal way to determine whether an interaction term is meaningful/significant in the model or not? Is it by the p-value? And should I remove the term and fit the model (& again) if it is not significant and deemed not important by biological knowledge? Or should I just fit the full model once and go ahead to interpret the contrasts of interest? Is there a formal way (e.g. the diagnostics people use to assess ANOVA models) for evaluating the quality of the whole fitted model? Or I need not worry about this at all? 5. I have some confusion about the multiple hypothesis testing adjustment for many contrasts. (I know I should better only use the p-values/B/moderated t for ranking genes, but I am just curious to know). For example in limma one would extract the contrast of interest and list the candidate genes out on Toptable with the option = FDR etc., but isn't it true that this is just the adjustment for that estimate? When I evaluate all possible contrasts, how can I adjust the multiple hypothesis testing for the genes in all the contrasts that I have made? 6. A minor question. What does M & A in the Toptable of a coefficient/contrast mean for affy data? If A stands the log2 intensity estimate for that coefficient/contrast, is M the log2 ratio of (mu + (coefficient or contrast estimate))/mu? Thanks a lot for answering my questions. Any other advice for my design is also welcome. Best regards, Fai -- Yuk Fai Leung Department of Molecular and Cellular Biology Harvard University BL 2079, 16 Divinity Avenue Cambridge, MA 02138 Tel: 617-495-2599 Fax: 617-496-3321 email: yfleung@mcb.harvard.edu; yfleung@genomicshome.com URL: http://genomicshome.com

GO affy limma GO affy limma • 1.1k views

ADD COMMENT • link updated 19.9 years ago by Arne.Muller@aventis.com ▴ 620 • written 19.9 years ago by YUK FAI LEUNG ▴ 140

0

Entering edit mode

Arne.Muller@aventis.com ▴ 620

@arnemulleraventiscom-466

Last seen 9.6 years ago

Hi Fai, Below I comment on some of your questions. I'm courious my self about the answers other people will post ... > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of YUK FAI > LEUNG > Sent: 10 June 2004 23:06 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Linear modeling for affy experiment > > > Hi there, > > I am going to do an affy experiment for the first time. I have a few > questions about the linear model design for my experiment. > > I have a 3x2 factorial experiment. Three biological samples > (wild type > whole animal (WA), wild type tissue (WT), mutant tissue (MT)) and two > time points (t1 and t2). The effects of interest are the > mutant (M) and > tissue (T) specific expression and their changes other time (Ti). > > I suppose the model should have the following equations (error term > omitted) and I would have 3 affy biological replicates for > each condition. > > WA.t1 = mu > WT.t1 = mu + T > MT.t1 = mu + T + M + T*M > WA.t2 = mu + Ti > WT.t2 = mu + T + Ti + T*Ti > MT.t2 = mu + T + M + Ti + T*M + M*Ti + T*Ti + T*M*Ti I guess the model in R would just look like this: fit <- lm(Value ~ Sample*Time) where Sample is a factor with the levels WA, WT, MT, and Time has two levels. If you also want to include a 3rd factor which is either Genotype (wild type or mutant) or something like "origin" which is Animal or Tissue. I guess one will then have a problem with nesting. Any coments on this, did I miss the point ;-) ? > My questions are: > > 1. How many degree of freedom do I have in the model? How do > I calculate > degree of freedom in linear model in general? For my case, is it 3 > arrays * 6 conditions - 8 coefficients to be estimated = 10 degree of > freedom? Yes for the 2 factor model. > 2. If I want to increase my degree of freedom, is it true > that I can do > it by increasing my replicate? If it is true, is there a difference > between repeating a sample with more coeffcients (e.g. MT.t2) and a > sample with less coefficients (e.g. WA.t1)? It seems to me having a > repeat with more coefficients is better off, but I don't know have to > stay it out statistically. hm, I'm not sure what you mean. Off course it's good to run more replicates ;-) . You can boost your DFs by running your model on the probe level instead on the summarized probe sets. The full model would the look like this: fit <- lm(Value ~ Sample*Time*Probe) This adds a lot of DFs, since you've about 16 probes per probe set. > 3. What is the formal way to determine whether an interaction term is > meaningful/significant in the model or not? Is it by the p-value? And > should I remove the term and fit the model (& again) if it is not > significant and deemed not important by biological knowledge? > Or should > I just fit the full model once and go ahead to interpret the > contrasts > of interest? Is there a formal way (e.g. the diagnostics > people use to > assess ANOVA models) for evaluating the quality of the whole fitted > model? Or I need not worry about this at all? look at interaction.plot, this is helpful. Usually if lines in an interaction plot are sort of parallel there is no interaction, oterwise it means that the one factor is dependent on the level of the other. You can off course not look at all these plot if you run one model per gene,s o you may just judge by the p-value if there's a significant interaction. If there are not many significant interactions you can well omit the interacion term. The overall quality of the model can be tested by looking at the distribution of the R-squared values and the distribution of the residuals, these should be normal distributed. You can test this with a ks.test or shapiro.test. Often you'll see that the residuals are not normal (extreme tails) - this has an impact on. Kerr and Churchill have done work on this. You can look at the distribution of residuals of all models (from all genes) by calculating normalized residuals. (obs.values - fitted.values) / var(obs.values) where obs.values are the observed values You may also need to log transform your initial data. regards, Arne > 5. I have some confusion about the multiple hypothesis testing > adjustment for many contrasts. (I know I should better only use the > p-values/B/moderated t for ranking genes, but I am just curious to > know). For example in limma one would extract the contrast of > interest > and list the candidate genes out on Toptable with the option > = FDR etc., > but isn't it true that this is just the adjustment for that estimate? > When I evaluate all possible contrasts, how can I adjust the multiple > hypothesis testing for the genes in all the contrasts that I > have made? > > 6. A minor question. What does M & A in the Toptable of a > coefficient/contrast mean for affy data? If A stands the log2 > intensity > estimate for that coefficient/contrast, is M the log2 ratio of (mu + > (coefficient or contrast estimate))/mu? > > Thanks a lot for answering my questions. Any other advice for > my design > is also welcome. > > Best regards, > Fai > -- > Yuk Fai Leung > Department of Molecular and Cellular Biology > Harvard University > BL 2079, 16 Divinity Avenue > Cambridge, MA 02138 > Tel: 617-495-2599 > Fax: 617-496-3321 > email: yfleung@mcb.harvard.edu; yfleung@genomicshome.com > URL: http://genomicshome.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 19.9 years ago Arne.Muller@aventis.com ▴ 620

Login before adding your answer.