Question: Model matrix for anova in limma
0
3.8 years ago by
serpalma.v50
Germany
serpalma.v50 wrote:

Dear members,
I am currently learning how to analyse micro array data and I am having problems designing a matrix for paired anova using limma.
The experiment consists in 4 animals. From each animal, 4 cell cultures where prepared. Each cell culture was treated with 4 conditions.

animalCulture <- c(rep("A1", 4), rep("A2", 4), rep("A3", 4), rep("A4", 4))
treatm <- rep(c("Cond1", "Cond2", "Cond3", "Cond4"), 4)
df <- data.frame(animalCulture = animalCulture, treatm = treatm)

To construct a model matrix, I was thinking blocking by the animalCulture variable

model.ma <- model.matrix(~0+df$animalCulture+df$treatm)

Then variances and contrasts will be defined as follow:

fit <- lmFit(eset, model.ma)
contrastmatrix <- makeContrasts(Cond1-Cond2,
Cond1-Cond3,
Cond1-Cond4,
Cond2-Cond3,
Cond2-Cond4,
cond3-Cond4,
levels=model.ma)
fit2 <- contrasts.fit(fit, contrastmatrix)
ebayes <- eBayes(fit2)


Is this the correct way to model the experimental design?

Is this the correct way to extract contrasts between conditions accounting for samples that comefrom the same animal?

The output of the model moatrix excludes the Cond1. What is the statistical meaning of this?

Thanks for the help

modified 3.7 years ago by Steve Lianoglou12k • written 3.8 years ago by serpalma.v50

Your limma tag is misspelt, so people who could answer you won't see this post. Also, the markdown stuff doesn't work here, so you have to format code using the options in the post editor.

Answer: Model matrix for anova in limma
4
3.8 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

You're almost there. If you look at the column names for model.ma, you should get something like:

[1] "df$animalCultureA1" "df$animalCultureA2" "df$animalCultureA3" [4] "df$animalCultureA4" "df$treatmCond2" "df$treatmCond3"
[7] "df$treatmCond4"  The first four coefficients are blocking factors for each animal culture, and are largely uninteresting. (Specifically, they represent the log-average expression for condition 1 in each animal.) The last three coefficients represent the log-fold change for each corresponding condition against condition 1. Obviously, condition 1 doesn't have its own term, because there's no point computing a log-fold change against itself. To test for DE between conditions 1 and 2, all you would need to do is to drop the df$treatmCond2 term. This can be done by specifying coef=5 argument in topTable, without requiring makeContrasts. The same applies for the coefficient corresponding to each other condition against condition 1. Of course, if you then want to test, e.g., conditions 2 and 3, you would then need to use makeContrasts and contrasts.fit before running eBayes and topTable:

con <- makeContrasts(df.treatmCond2 - df.treatmCond3, levels=model.ma) # renamed '$' to '.' fit <- contrasts.fit(fit, contrasts=con) Note that you'll have to fiddle with your column names to get rid of the $, makeContrasts doesn't like them.

If you want to do an ANOVA-like contrast, the null hypothesis would be that there is no DE between any of the conditions. In that case, you don't need to manually specify all pairwise comparisons between conditions. Instead, you just need to specify comparisons to one "reference" condition. For example, if condition 1 is equal to condition 2, and condition 1 is equal to condition 3, then obviously, conditions 2 and 3 will also be equal. With this in mind, if we set condition 1 as the reference (note that the idea of a reference is just for demonstration purposes, limma doesn't really care), then we can do an ANOVA-like contrast by comparing all other conditions to condition 1. This is equivalent to dropping the last three coefficients, i.e., set coef=5:7 in topTable.

What would be in practice if the 0 is replaced by a 1 in the model?

model.matrix(~1+df$animalCulture+df$treatm) or model.matrix(~df$animalCulture+df$treatm)

I have trouble figuring out how the blocking would work out in that situation.

You'd simply switch to an intercept model, where the first coefficient is an all-ones intercept column (probably representing the log-expression of condition 1 in culture A1) and the next three coefficients are blocking terms representing the average log-fold change of the samples in culture A2/A3/A4 over the corresponding sample in A1. The interpretation of your coefficients of interest (i.e., the last three terms for condition-specific log-fold changes) shouldn't be affected.

Answer: Model matrix for anova in limma
1
3.7 years ago by
serpalma.v50
Germany
serpalma.v50 wrote:

I very much appreciate the help in this thread. However, one question is unanswered: how to make the contrasts?

After defining the design matrix and run lmFit in the expression set, I want to retrieve all contrasts by designing a contrast matrix with contrastmatrix:

The problem is that Cond1 is gone and cannot be used in the function. And if I take Cond1 out, I get a very strange matrix.

My question is:

how to acount for the contrasts including Cond1 and how to model it so that the contrasts occur within individuals?

Thanks

1

Using the design matrix in your original post, you can set up contrasts like so (after replacing \$ with .):

contrastmatrix <- makeContrasts(df.treatmCond2, # Condition 2 vs 1
df.treatmCond3, # Condition 3 vs 1
df.treatmCond4, # Condition 4 vs 1
df.treatmCond2-df.treatmCond3, # Condition 2 vs 3
df.treatmCond2-df.treatmCond4, # Condition 2 vs 4
df.treatmCond3-df.treatmCond4, # Condition 3 vs 4
levels=model.ma)

This is possible because each Cond* coefficient represents the log-fold change of the corresponding condition over condition 1. Thus, you don't need to have Cond1 in the specification of your contrast matrix - it's already been accounted for.

Answer: Model matrix for anova in limma
0
3.8 years ago by
serpalma.v50
Germany
serpalma.v50 wrote:

Dear Aron,

So if the first four coefficients are uninteresting, are they still necessary?

I am interested to account for the within-individual variability.

1

They are biologically uninteresting, but statistically necessary, otherwise the differences in absolute expression between individuals will inflate the variance. For example, let's say that condition A and B had log-expression values of 10 and 12 in individual 1, and 20 and 22 in individual 2. The variance in the log-expression would be really high between individuals; but we don't care about that, because we're looking for changes within individuals which are much more consistent (change of 2 between A and B in both). If you didn't have the first four terms, your variance would be unnecessarily increased such that you'd lose power to detect DE. Including them into the model "absorbs" individual-specific expression and ensures that only differences within individuals are used to calculate the variance. Any within-individual changes will be captured by the last three coefficients.