Model matrix for anova in limma
3
0
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 2.2 years ago
Germany

Dear members,
I am currently learning how to analyse micro array data and I am having problems designing a matrix for paired anova using limma.
The experiment consists in 4 animals. From each animal, 4 cell cultures where prepared. Each cell culture was treated with 4 conditions.

animalCulture <- c(rep("A1", 4), rep("A2", 4), rep("A3", 4), rep("A4", 4))
treatm <- rep(c("Cond1", "Cond2", "Cond3", "Cond4"), 4)
df <- data.frame(animalCulture = animalCulture, treatm = treatm)

To construct a model matrix, I was thinking blocking by the animalCulture variable

model.ma <- model.matrix(~0+df$animalCulture+df$treatm)

Then variances and contrasts will be defined as follow:

fit <- lmFit(eset, model.ma)
contrastmatrix <- makeContrasts(Cond1-Cond2,
                                Cond1-Cond3,
                                Cond1-Cond4,
                                Cond2-Cond3,
                                Cond2-Cond4,
                                cond3-Cond4,
                                levels=model.ma)
fit2 <- contrasts.fit(fit, contrastmatrix)
ebayes <- eBayes(fit2) 

Is this the correct way to model the experimental design? 

Is this the correct way to extract contrasts between conditions accounting for samples that comefrom the same animal?

The output of the model moatrix excludes the Cond1. What is the statistical meaning of this?

Thanks for the help

limma design matrix microarray limma • 5.0k views
ADD COMMENT
0
Entering edit mode

Your limma tag is misspelt, so people who could answer you won't see this post. Also, the markdown stuff doesn't work here, so you have to format code using the options in the post editor.

ADD REPLY
4
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

You're almost there. If you look at the column names for model.ma, you should get something like:

[1] "df$animalCultureA1" "df$animalCultureA2" "df$animalCultureA3"
[4] "df$animalCultureA4" "df$treatmCond2"     "df$treatmCond3"    
[7] "df$treatmCond4"    

The first four coefficients are blocking factors for each animal culture, and are largely uninteresting. (Specifically, they represent the log-average expression for condition 1 in each animal.) The last three coefficients represent the log-fold change for each corresponding condition against condition 1. Obviously, condition 1 doesn't have its own term, because there's no point computing a log-fold change against itself.

To test for DE between conditions 1 and 2, all you would need to do is to drop the df$treatmCond2 term. This can be done by specifying coef=5 argument in topTable, without requiring makeContrasts. The same applies for the coefficient corresponding to each other condition against condition 1. Of course, if you then want to test, e.g., conditions 2 and 3, you would then need to use makeContrasts and contrasts.fit before running eBayes and topTable:

con <- makeContrasts(df.treatmCond2 - df.treatmCond3, levels=model.ma) # renamed '$' to '.'
fit <- contrasts.fit(fit, contrasts=con)

Note that you'll have to fiddle with your column names to get rid of the $, makeContrasts doesn't like them.

If you want to do an ANOVA-like contrast, the null hypothesis would be that there is no DE between any of the conditions. In that case, you don't need to manually specify all pairwise comparisons between conditions. Instead, you just need to specify comparisons to one "reference" condition. For example, if condition 1 is equal to condition 2, and condition 1 is equal to condition 3, then obviously, conditions 2 and 3 will also be equal. With this in mind, if we set condition 1 as the reference (note that the idea of a reference is just for demonstration purposes, limma doesn't really care), then we can do an ANOVA-like contrast by comparing all other conditions to condition 1. This is equivalent to dropping the last three coefficients, i.e., set coef=5:7 in topTable.

ADD COMMENT
0
Entering edit mode

What would be in practice if the 0 is replaced by a 1 in the model? 

model.matrix(~1+df$animalCulture+df$treatm) or model.matrix(~df$animalCulture+df$treatm)

I have trouble figuring out how the blocking would work out in that situation.

ADD REPLY
0
Entering edit mode

You'd simply switch to an intercept model, where the first coefficient is an all-ones intercept column (probably representing the log-expression of condition 1 in culture A1) and the next three coefficients are blocking terms representing the average log-fold change of the samples in culture A2/A3/A4 over the corresponding sample in A1. The interpretation of your coefficients of interest (i.e., the last three terms for condition-specific log-fold changes) shouldn't be affected.

ADD REPLY
1
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 2.2 years ago
Germany

I very much appreciate the help in this thread. However, one question is unanswered: how to make the contrasts?

After defining the design matrix and run lmFit in the expression set, I want to retrieve all contrasts by designing a contrast matrix with contrastmatrix:

The problem is that Cond1 is gone and cannot be used in the function. And if I take Cond1 out, I get a very strange matrix.

My question is: 

how to acount for the contrasts including Cond1 and how to model it so that the contrasts occur within individuals?

 

Thanks

 

 

 

 

ADD COMMENT
1
Entering edit mode

Using the design matrix in your original post, you can set up contrasts like so (after replacing $ with .):

contrastmatrix <- makeContrasts(df.treatmCond2, # Condition 2 vs 1
                                df.treatmCond3, # Condition 3 vs 1
                                df.treatmCond4, # Condition 4 vs 1
                                df.treatmCond2-df.treatmCond3, # Condition 2 vs 3
                                df.treatmCond2-df.treatmCond4, # Condition 2 vs 4
                                df.treatmCond3-df.treatmCond4, # Condition 3 vs 4
                                levels=model.ma)

This is possible because each Cond* coefficient represents the log-fold change of the corresponding condition over condition 1. Thus, you don't need to have Cond1 in the specification of your contrast matrix - it's already been accounted for.

ADD REPLY
0
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 2.2 years ago
Germany

Dear Aron,

So if the first four coefficients are uninteresting, are they still necessary?

I am interested to account for the within-individual variability. 

 

 

ADD COMMENT
1
Entering edit mode

They are biologically uninteresting, but statistically necessary, otherwise the differences in absolute expression between individuals will inflate the variance. For example, let's say that condition A and B had log-expression values of 10 and 12 in individual 1, and 20 and 22 in individual 2. The variance in the log-expression would be really high between individuals; but we don't care about that, because we're looking for changes within individuals which are much more consistent (change of 2 between A and B in both). If you didn't have the first four terms, your variance would be unnecessarily increased such that you'd lose power to detect DE. Including them into the model "absorbs" individual-specific expression and ensures that only differences within individuals are used to calculate the variance. Any within-individual changes will be captured by the last three coefficients.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6