6.6 years ago by
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Your test for the 3-way interaction is correct, although 3-way
interactions are pretty hard to interpret.
However testing for the 2-way interaction in the presence of a 3-way
interaction does not make statistical sense. This is because the
parametrization of the 2-way interaction as a subset of the 3-way is
somewhat arbitrary. Before you can test the 2-way interaction
species*treatment in a meaningful way you would need to accept that
3-way interaction is not necessary and remove it from the model.
In general, I am of the opinion that classical statistical factorial
interation models do not usually provide the most meaningful
parametrizations for genomic experiments. In most cases, I prefer to
the saturated model (a different level for each treatment combination)
make specific contrasts. There is some discussion of this in the
In your case, I guess that you might want to test for
interaction separately at each time point. It is almost impossible to
this within the classical 3-way factorial setup. However it is easy
the one-way approach I just mentioned, or else you could use:
~Age + Age:Species + Age:Treatment + Age:Species:Treatment
> Date: Thu, 9 May 2013 14:55:46 -0400
> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] Statistics question for multi-factor interaction
> in edgeR
> Hi. I need to generate two GLM tests of a factorial design with RNA-
> count data. I have 3 factors with 2 levels apiece (2 species X 2
> treatments X 2 times), and 4 separate replicates each (i.e., we made
> total of 2*2*2*4 = 32 separate libraries). Our main interest is in
> interaction of species*treatment, as we think species A will alter
> expression in the treatment stress vs. treatment benign, whereas
> species B is expected to show little change. However, we?d like to
> do another test of species*treatment*time, because it is possible
> the ability of species A to alter gene expression in response to the
> stress treatment may differ at the 1st versus 2nd time point.
> I think the way to set this up, is to create a design matrix as
> with the lrt test with coef 5 giving the differentially expressed
> for the species*treatment test, and coef 8 giving the the
> expressed gene for the species*treatment*time test (after calling
> topTags that is). Yet to ensure I have the statistics correct, my
> questions are: (1) is this thinking correct, as I don?t see many 3x2
> factorial models to follow, and (2) do I need to set up a reference
> somehow (which I assume would be the set of four samples with
> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is
> correct or needed).
> Many thanks in advance for your insight!
>> designFF <- model.matrix(~Treatment*Species*Age)
>  "(Intercept)"
>  " TreatmentStress"
>  "SpeciesA "
>  "Time1"
>  "TreatmentStress:SpeciesA"
>  "TreatmentStress:Time1"
>  "SpeciesA:Time1"
>  "TreatmentStress:SpeciesA:Time1"
> And then to run tests with:
>> fit <- glmFit(y, designFF)
>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5)
>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8)
The information in this email is confidential and