Question: F-tests for factorial effects - limma
0
14.5 years ago by
Naomi Altman6.0k
Naomi Altman6.0k wrote:
Thanks, Gordon. We will try contrasts.fit. If that does not give us what we want, I will consider writing an aov.series routine. The microarray experiments I see are often factorial, RCB, or split plot designs. Particularly with Affy arrays, they are often balanced. I guess I am used to thinking in terms of "effects" rather than individual contrasts, but why would the main effect and interaction idea be different for microarray experiments than for other experiments? e.g. treatment 1-3 genotype 1-3 The interaction is always of interest. However, even in the absence of interaction, the genotype or treatment effect is often of interest. e.g. the genotype may be various knockouts - we are interested in knowing what other genes have been induced or repressed. Why is the test for main effects for a particular gene less interesting in this context than e.g. the change in weight or other univariate measure? --Naomi At 12:23 AM 12/23/2004 +1100, Gordon K Smyth wrote: > > Date: Tue, 21 Dec 2004 17:21:19 -0500 > > From: Naomi Altman <naomi@stat.psu.edu> > > Subject: [BioC] F-tests for factorial effects - limma > > To: bioconductor@stat.math.ethz.ch > > > > I am analyzing a 2-factor factorial Affy experiment, with 3 d.f. for each > > factor. > > > > I would like to get the F-tests for the main effects and interactions using > > limma. > > > > I have computed all the contrasts, and got the t-tests (both unadjusted and > > eBayes). I do know how to combine these into F-tests "by hand" but I could > > not figure out if there was a simple way to do this using limma. > >limma doesn't have any easy way to deal with main effects and >interactions, at least not with main >effects, interactions are actually simpler. I haven't implemented this >because I've never been >able to figure out what one would do with these things in a microarray >context. > >To compute F-tests for main effects and interaction, the easiest way would >probably be to compute >the SS for main effects and interactions by non-limma means, then use >shrinkVar() to adjust the >residual mean squares, i.e., the F-statistic denominators. > >If you only want F-tests for interactions, the following code would work: > >X <- model.matrix(~a*b) >fit <- lmFit(eset, X) >p <- ncol(X) >cont.ia <- diag(p)[,attr(X,"assign")==3] >fit.ia <- eBayes(contrasts.fit(fit, cont.ia)) > >Now fit.ia contains the F-statistic and p-values for the interaction in >fit.ia$F and >fit.ia$F.p.value. > > > I had a look at FStat (classifyTestsF). There seems to be a problem, in > > that the matrix tstat is not premultiplied by the contrast matrix when the > > F-statistics are computed. So, if the contrasts are not full- rank, an > > error is generated (instead of the F-statistics) because nrow(Q) != > > ncol(tstat).. (See the lines below). > >No, the code is correct. FStat is quite happy with non full rank >contrasts but the contrast >matrix must be applied using contrasts.fit() before entering FStat(). You >should not expect to >see a contrast matrix inside the classifyTestsF() code. > >Gordon > > > if (fstat.only) { > > fstat <- drop((tstat%*% Q)^2 %*% array(1, c(r, 1))) > > attr(fstat, "df1") <- r > > attr(fstat, "df2") <- df > > return(fstat) > > } > > > > I figured that before I fiddled with the code, I would check to make sure > > that I didn't miss an existing routine to do this. > > > > Thanks in advance. > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
microarray affy limma • 1.4k views
modified 14.5 years ago by Gordon Smyth37k • written 14.5 years ago by Naomi Altman6.0k
Answer: F-tests for factorial effects - limma
0
14.5 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:
Why don't I find main effects of interest? Let's suppose you're considering genotype*treatment and the genotypes are wt and ko (knockout). My experience is that an experimenter wants to know which genes respond to the treatment in the wt, which genes respond in the ko, and which respond differently between the two genotypes (the interaction). So the experiment wants two separate lists and want to compare and constrast these lists. But the factorial main effect for the treatment refers to the *average* response for a each gene averaged over wt and ko. So it is not a real effect for any given genotype. I've never seen a microarray context where the the experimenter wanted to know responses averaged over different and essentially different genotypes. Of course one may want an effect averaged over different cell lines or individuals from the same genotype, but not over different genotypes which are essentially different. That's my take on it. I would be interested in different opinions. I've written briefly on this in the Factorial Design section on page 30 of the User's Guide at http://bioinf.wehi.edu.au/limma/usersguide.pdf The reason that main effects are computationally different to interactions is that I have to treat all microarray experiments as unbalanced designs. I have to do this because I have to allow for missing values or spot quality weights in the case of two-colour data and for weights arising from robust estimation or affyPLM standard errors in the case of affy data. This means that the anova break-down is a sequential ANOVA table with first main effect adjusted for intercept, second main effect adjusted for the first, and interaction adjusted for all main effects. In the case of the interaction one can just fit the full model and then test that the coefficients are zero. With main effects one can't do this because that would be equivalent to adjusting the main effects for the interaction which is not what is wanted. This means that to get the main effect SS you actually have to fit three or four different models to get the sequential SS. Of course you could othogonalize all the coefficients to solve the problem, but this would have to be done on a gene-wise basis to allow for the possibility of gene-specific weights. Gordon At 01:14 AM 23/12/2004, Naomi Altman wrote: >Thanks, Gordon. We will try contrasts.fit. If that does not give us what >we want, I will consider writing an aov.series routine. > >The microarray experiments I see are often factorial, RCB, or split plot >designs. Particularly with Affy arrays, they are often balanced. I guess >I am used to thinking in terms of "effects" rather than individual >contrasts, but why would the main effect and interaction idea be different >for microarray experiments than for other experiments? > >e.g. treatment 1-3 > genotype 1-3 > >The interaction is always of interest. However, even in the absence of >interaction, the genotype or treatment effect is often of interest. e.g. >the genotype may be various knockouts - we are interested in knowing what >other genes have been induced or repressed. Why is the test for main >effects for a particular gene less interesting in this context than e.g. >the change in weight or other univariate measure? > > >--Naomi > >At 12:23 AM 12/23/2004 +1100, Gordon K Smyth wrote: >> > Date: Tue, 21 Dec 2004 17:21:19 -0500 >> > From: Naomi Altman <naomi@stat.psu.edu> >> > Subject: [BioC] F-tests for factorial effects - limma >> > To: bioconductor@stat.math.ethz.ch >> > >> > I am analyzing a 2-factor factorial Affy experiment, with 3 d.f. for each >> > factor. >> > >> > I would like to get the F-tests for the main effects and interactions >> using >> > limma. >> > >> > I have computed all the contrasts, and got the t-tests (both >> unadjusted and >> > eBayes). I do know how to combine these into F-tests "by hand" but I >> could >> > not figure out if there was a simple way to do this using limma. >> >>limma doesn't have any easy way to deal with main effects and >>interactions, at least not with main >>effects, interactions are actually simpler. I haven't implemented this >>because I've never been >>able to figure out what one would do with these things in a microarray >>context. >> >>To compute F-tests for main effects and interaction, the easiest way >>would probably be to compute >>the SS for main effects and interactions by non-limma means, then use >>shrinkVar() to adjust the >>residual mean squares, i.e., the F-statistic denominators. >> >>If you only want F-tests for interactions, the following code would work: >> >>X <- model.matrix(~a*b) >>fit <- lmFit(eset, X) >>p <- ncol(X) >>cont.ia <- diag(p)[,attr(X,"assign")==3] >>fit.ia <- eBayes(contrasts.fit(fit, cont.ia)) >> >>Now fit.ia contains the F-statistic and p-values for the interaction in >>fit.ia$F and >>fit.ia$F.p.value. >> >> > I had a look at FStat (classifyTestsF). There seems to be a problem, in >> > that the matrix tstat is not premultiplied by the contrast matrix when the >> > F-statistics are computed. So, if the contrasts are not full- rank, an >> > error is generated (instead of the F-statistics) because nrow(Q) != >> > ncol(tstat).. (See the lines below). >> >>No, the code is correct. FStat is quite happy with non full rank >>contrasts but the contrast >>matrix must be applied using contrasts.fit() before entering >>FStat(). You should not expect to >>see a contrast matrix inside the classifyTestsF() code. >> >>Gordon >> >> > if (fstat.only) { >> > fstat <- drop((tstat%*% Q)^2 %*% array(1, c(r, 1))) >> > attr(fstat, "df1") <- r >> > attr(fstat, "df2") <- df >> > return(fstat) >> > } >> > >> > I figured that before I fiddled with the code, I would check to make sure >> > that I didn't miss an existing routine to do this. >> > >> > Thanks in advance. >> > >> > Naomi S. Altman 814-865-3791 (voice) >> > Associate Professor >> > Bioinformatics Consulting Center >> > Dept. of Statistics 814-863-7114 (fax) >> > Penn State University 814-865-1348 (Statistics) >> > University Park, PA 16802-2111
Thanks Gordon and Naomi for the discussion. I understand the example that Gordon gives, and the discussion and analyses in p. 30 and ff. of the limma user's guide. I take seriously the good old advice of not even looking at main effects and their estimates in the pressence of interactions. However, in many cases it is necessary to address the isues of what we do when there are/aren't interactions and how we obtain estimates of main effects over levels of another factor. For instance, a growing concern with many microarray results in cancer research is that these are observational data, samples collected from patients that, annoyingly, differ in traits such as sex, age, diet, exercise, etc, which can affect gene expression. But these microarray data are often analized just for differences between cancer status without due considertaion to variation in other factors. J.D. Potter (and others) have made this point recently in several papers, where it is argued that many of the differences between cancer types seen in microarray studies could be the result of confounding by age and sex. So now suppose we are comparing gene expression in women with and without breast cancer, and want to include possible age effects (this is already much too simple a scenario). In a case such as this I think that a very reasonable biological question is to try to find those genes that show a difference between cancer and non-cancer patients, AND do NOT show an interaction age*cancer status. The estimate of interest for gene difference is usually an estimate over all ages. A possible approach might be to carry out first a test of interaction. Then, use a very liberal criterion for "significance of interaction" (e.g., unadjusted p < 0.1). Finally, fit a model without interaction only to those genes that show no evidence of interaction, and obtain estimates and p-values of main effects (the liberal p-value for interaction would be a way of trying to overcome low power for detecting interactions). I see two main problems here. One is the "pre-test estimation" issue (see recent thread about this issue by F. Hong, N. Altman and G. Smyth), but we definitely do not want to obtain the estimates of effect from a model with interaction (as Gordon mentions in his second paragraph). The second problem is that it is not obvious to me how to adjust the p-values from the second set of analyses. I think some of the recent work of Y. Benjamini on D. Yekutieli on hierarchical FDR on trees of hypotheses mighe help here. R. On Friday 07 January 2005 02:12, Gordon Smyth wrote: > Why don't I find main effects of interest? Let's suppose you're considering > genotype*treatment and the genotypes are wt and ko (knockout). My > experience is that an experimenter wants to know which genes respond to the > treatment in the wt, which genes respond in the ko, and which respond > differently between the two genotypes (the interaction). So the experiment > wants two separate lists and want to compare and constrast these lists. But > the factorial main effect for the treatment refers to the *average* > response for a each gene averaged over wt and ko. So it is not a real > effect for any given genotype. I've never seen a microarray context where > the the experimenter wanted to know responses averaged over different and > essentially different genotypes. Of course one may want an effect averaged > over different cell lines or individuals from the same genotype, but not > over different genotypes which are essentially different. That's my take on > it. I would be interested in different opinions. I've written briefly on > this in the Factorial Design section on page 30 of the User's Guide at > http://bioinf.wehi.edu.au/limma/usersguide.pdf > > The reason that main effects are computationally different to interactions > is that I have to treat all microarray experiments as unbalanced designs. I > have to do this because I have to allow for missing values or spot quality > weights in the case of two-colour data and for weights arising from robust > estimation or affyPLM standard errors in the case of affy data. This means > that the anova break-down is a sequential ANOVA table with first main > effect adjusted for intercept, second main effect adjusted for the first, > and interaction adjusted for all main effects. In the case of the > interaction one can just fit the full model and then test that the > coefficients are zero. With main effects one can't do this because that > would be equivalent to adjusting the main effects for the interaction which > is not what is wanted. This means that to get the main effect SS you > actually have to fit three or four different models to get the sequential > SS. Of course you could othogonalize all the coefficients to solve the > problem, but this would have to be done on a gene-wise basis to allow for > the possibility of gene-specific weights. > > Gordon > > At 01:14 AM 23/12/2004, Naomi Altman wrote: > >Thanks, Gordon. We will try contrasts.fit. If that does not give us what > >we want, I will consider writing an aov.series routine. > > > >The microarray experiments I see are often factorial, RCB, or split plot > >designs. Particularly with Affy arrays, they are often balanced. I guess > >I am used to thinking in terms of "effects" rather than individual > >contrasts, but why would the main effect and interaction idea be different > >for microarray experiments than for other experiments? > > > >e.g. treatment 1-3 > > genotype 1-3 > > > >The interaction is always of interest. However, even in the absence of > >interaction, the genotype or treatment effect is often of interest. e.g. > >the genotype may be various knockouts - we are interested in knowing what > >other genes have been induced or repressed. Why is the test for main > >effects for a particular gene less interesting in this context than e.g. > >the change in weight or other univariate measure? > > > > > >--Naomi > > > >At 12:23 AM 12/23/2004 +1100, Gordon K Smyth wrote: > >> > Date: Tue, 21 Dec 2004 17:21:19 -0500 > >> > From: Naomi Altman <naomi@stat.psu.edu> > >> > Subject: [BioC] F-tests for factorial effects - limma > >> > To: bioconductor@stat.math.ethz.ch > >> > > >> > I am analyzing a 2-factor factorial Affy experiment, with 3 d.f. for > >> > each factor. > >> > > >> > I would like to get the F-tests for the main effects and interactions > >> > >> using > >> > >> > limma. > >> > > >> > I have computed all the contrasts, and got the t-tests (both > >> > >> unadjusted and > >> > >> > eBayes). I do know how to combine these into F-tests "by hand" but I > >> > >> could > >> > >> > not figure out if there was a simple way to do this using limma. > >> > >>limma doesn't have any easy way to deal with main effects and > >>interactions, at least not with main > >>effects, interactions are actually simpler. I haven't implemented this > >>because I've never been > >>able to figure out what one would do with these things in a microarray > >>context. > >> > >>To compute F-tests for main effects and interaction, the easiest way > >>would probably be to compute > >>the SS for main effects and interactions by non-limma means, then use > >>shrinkVar() to adjust the > >>residual mean squares, i.e., the F-statistic denominators. > >> > >>If you only want F-tests for interactions, the following code would work: > >> > >>X <- model.matrix(~a*b) > >>fit <- lmFit(eset, X) > >>p <- ncol(X) > >>cont.ia <- diag(p)[,attr(X,"assign")==3] > >>fit.ia <- eBayes(contrasts.fit(fit, cont.ia)) > >> > >>Now fit.ia contains the F-statistic and p-values for the interaction in > >>fit.ia$F and > >>fit.ia$F.p.value. > >> > >> > I had a look at FStat (classifyTestsF). There seems to be a problem, > >> > in that the matrix tstat is not premultiplied by the contrast matrix > >> > when the F-statistics are computed. So, if the contrasts are not > >> > full-rank, an error is generated (instead of the F-statistics) because > >> > nrow(Q) != ncol(tstat).. (See the lines below). > >> > >>No, the code is correct. FStat is quite happy with non full rank > >>contrasts but the contrast > >>matrix must be applied using contrasts.fit() before entering > >>FStat(). You should not expect to > >>see a contrast matrix inside the classifyTestsF() code. > >> > >>Gordon > >> > >> > if (fstat.only) { > >> > fstat <- drop((tstat%*% Q)^2 %*% array(1, c(r, 1))) > >> > attr(fstat, "df1") <- r > >> > attr(fstat, "df2") <- df > >> > return(fstat) > >> > } > >> > > >> > I figured that before I fiddled with the code, I would check to make > >> > sure that I didn't miss an existing routine to do this. > >> > > >> > Thanks in advance. > >> > > >> > Naomi S. Altman 814-865-3791 (voice) > >> > Associate Professor > >> > Bioinformatics Consulting Center > >> > Dept. of Statistics 814-863-7114 (fax) > >> > Penn State University 814-865-1348 > >> > (Statistics) University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) Este correo electronico y, en su caso, cualquier fichero anexo al mismo, contiene informacion exclusivamente dirigida a su destinatario o destinatarios. Si Vd. ha recibido este mensaje por error, se ruega notificar esta circunstancia al remitente. Las ideas y opiniones manifestadas en este mensaje corresponden unicamente a su autor y no representan necesariamente a las del Centro Nacional de Investigaciones Oncologicas (CNIO). The information contained in this message is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator. Please note that the Spanish National Cancer Centre (CNIO), does not accept liability for any statements or opinions made which are clearly the sender's own and not expressly made on behalf of the Centre.