Hello All
Is this the correct design matrix for the following experiment? The experiment has a Normal cell line (P) and a Mutant cell line (X) that were grown with decreasing nutrient levels, A, B, and C. A is represent 100% nutrient for a particular component, B represent 50% , and C represents 0%. The Normal cells (P) could only grow in A and B, while the Mutant cells (X) could be grown in A, B, and C. The objective is of the study was to identify gene expression levels that changed (i.e., gene sensitive to the nutrient) that allowed the mutant to grow without the nutrient.
Design Matrix:
celltype <- factor(samples$ct, levels = c("P", "X"))
trt <- factor(samples$trt, levels= c("A", "B", "C"))
design <- model.matrix(~0+celltype+trt)
rownames(design) <- samples$file
design
colnames(design)
#[1] "celltypeP" "celltypeX" "trtB" "trtC"
Thanks
Hello Aaron
Thank you once again for your answers. I have some questions related to this experimental setup. I would like to pursue an unbalanced 2-factor ANOVA analysis to look at the differential genes involved in the main effects of treatment and cell type along with interaction. I know you mention a simpler one-way layout, but I'm curious to know how it would work with a 2-way layout. Since the data is unbalanced, I'm not entirely sure on how to drop the coefficients to answer my questions (1) interaction of cell type and nutrient level, 2) main effect of nutrient, irrespective of cell type and 3) main effect of cell type, irrespective of nutrient. Let me remind you about my experimental setup: a Normal cell line (P) and a Mutant cell line (X) that were grown with decreasing nutrient levels, A, B, and C. A is represent 100% nutrient for a particular component, B represent 50% , and C represents 0%. The Normal cells (P) could only grow in A and B, while the Mutant cells (X) could be grown in A, B, and C.
Could you please help me as to how I could use the above model to understand the main effects?
The same for this model above, how can I use it to understand the interaction effects? I'm don't understand what this term means: celltypeX:trtC (as cell type P did not grow at nutrient level C)
Thanks
For your first model, the intercept represents the average log-expression of P cells in treatment A.
celltypeX
represents the log-fold change of X over P, i.e., the main effect of the cell type.trtB
represents the log-fold change of B over A, andtrtC
represents the log-fold change of C over A, i.e., the main effects of treatment. You can drop these to test for each effect. However, this assumes that there are no significant interaction effects. If there are, you generally can't interpret the main effects in a sensible manner. For example, if a gene is upregulated in the mutant against normal in treatment A but downregulated in mutant against normal for treatment B, what would be the main effect of the mutation? Which direction would it be in? The simplicity of the additive model belies the strength of its underlying assumptions.Anyway, for your second model, the intercept represents the average log-expression of P cells in treatment A.
celltypeX
represents the log-fold change of X over P in treatment A only.trtB
represents the log-fold change of B over A in P cells only (buttrtC
is the same as before).celltypeX:trtB
represents the interaction between the effect of B treatment in X cells.celltypeX:trtC
has no meaning (it's an all-zero column) and should be removed. You can see how annoying this is to interpret, so it's easier (and statistically equivalent) to parametrize it as a one-way layout as I described originally.Hello
I keep coming back with more questions. Could you please help me understand how and why a one-way design layout is statistically equivalent to a 2-factor or a 3-factor design matrix? It would be helpful if you could point out some kind of reference or a paper. How does the layout change when a factor is an ordinal type value?
Our basic understanding is that in traditional statistics, you perform the ANOVA analysis and then perform the post-hoc testing. Will this still be possible from the one-way layout too? In the example above, how can I interpret the main effect of the nutrient?
Thanks
These general statistics questions are getting beyond the scope of the original post, and indeed, beyond the scope of this forum. All I'll say is this:
Thanks Aaron.
Well, So my first question had two factors - cell type and nutrient level. But you had suggested the grouping the factors together (which, if I understand correctly, has turned the model into a one-factor layout). Now, just so I understand clearly, are you also suggesting that I use the grouping design to ask specific pairwise questions rather than performing ANOVA analysis before I do the pairwise comparisons?
Yes. Obviously, you can still do an ANOVA across all groups:
... but it'll be harder to relate the results to your biological question, compared to doing specific pairwise comparisons where you know where the DE is occurring for each gene.