Hello, I am having a lot of difficulty setting up my design matrix. I have been searching extensively online for an answer but I think I lack some of the background to understand how to do this since my experiment is fairly complicated. I think I have a basic grasp of setting up model matrices in simple experiments but contrast coding is still a complete mystery to me.
This is a RIP-Seq experiment where I pulled down a protein a of interest with an antibody and then extracted the attached RNA which was then sequenced. This was performed in either wild-type (WT) tissue or protein knock-out (KO) where the KO is assumed to represent some kind of noise or background for the experiment. WT or KO tissue is from animals injected with various drugs.
My basic design is as follows:
I have tried setting up my model matrix a few different ways using
model.matrix() but I don't always see all the coefficients I am interested in. The kind of interaction I am interested in would be either Control_WT:Control_KO:Drug2_WT:Drug2_KO or Drug1_WT:Drug1_KO:Drug2_WT:Drug2_KO. In both of these cases, what I am really interested in is the WT conditions fold-change (i.e. Control over Drug2) but I want the KO noise to be modeled in as well. This 4-way interaction seems completely wrong and way too complicated. How is such modeling normally done? Limma refuses to use such columns if I try ("coefficients not estimable").
This brings me to a second question. Say I just had one coefficient ("Drug1_WT" for example). What exactly is limma giving back to me after
topTable()$AveExpr? This condition isn't being compared to anything is it? Or would it be compared to any other coefficients that come before it if I used an additive model to set up the regression equation?
The limma reference seems to suggest that the "AveExpr" column is the fold change compared to all other columns in the regression equation. If that is the case, then [limma documentation does not say this, see Gordon Smyth's answer below] should my coefficients really be Control_WT:Control_KO and then Drug2_WT:Drug2_KO? Then when I perform the fit and use
topTable() to find the Control_WT:Control_KO coefficient, the AveExpr column would reflect the fold change relative to Drug2_WT:Drug2_KO?
Sorry if this is really confusing. Any guidance on how to best set up my model matrix (and/or contrast matrix if it's even needed) would be greatly appreciated.