Hi, I'm having a hard time to understand how design matrices works in linear models.

From limma users guide:

WT MUvsWT

Array1 1 0

Array2 1 0

Array3 1 1

Array4 1 1

Array5 1 1

I do not understand how the coef WT (intercept) can be the average expression for the WT if it has a value of 1 even for those that are MU. Could someone explain me how to interpret this intercept? In theory I understand what the intercept is, but I don't understand how this vector capture the mean log expression for the first factor (WT) if the value is the same for every sample. Where the information about the levels are in this vector?

Let's say the mutant group has an average log-expression of X, and the wild-type group has an average log-expression of Y. In your fitted linear model, the sum of coefficients for a sample is equal to the average for the group in which it belongs:

WT = Y # For wild-type
MUvsWT + WT = X # For mutant

So it's simple to see that the WT coefficient is equal to Y, i.e., the mean of the wild-type group. The MUvsWT coefficient is simply X - Y, i.e., the log-fold change of the mutant group over the wild-type group. Obviously, if MUvsWT is the log-fold change, it needs to be added onto the average log-expression of the wild-type group to obtain the average for the mutant group. This is why you need values of 1 for the intercept in the mutant samples.