Creating design matrix in LIMMA
Entering edit mode
Last seen 5.6 years ago

Hi everyone, I am stuck with creating design matrix in LIMMA for my microarray experimental design. The experiment consists of 40 Agilent arrays. The subjects were treated with two stress conditions (S1 and S2). In each stress condition, the subjects were sampled in 5 timepoints (0, 6, 12, 24 and 48). Microarray experiment was performed in such a way that every fourth array of a particular timepoint, irrespective of stress treatment, is dye swapped. Basically, it is a time course experiment on the same subject with two different stresses. I want to to know which are the genes important/responsible for stress 1 or 2. The target file looks like below: 

SampleNumber  FileName  Cy3 Cy5 Stress
1 Array1 pool1 S1_Time0 S1
2 Array2 pool1 S1_Time0 S1
3 Array3 pool1 S1_Time0 S1
4 Array4 S1_Time0 pool1 S1
5 Array5 pool1 S1_Time6 S1
6 Array6 pool1 S1_Time6 S1
7 Array7 pool1 S1_Time6 S1
8 Array8 S1_Time6 pool1 S1
9 Array9 pool1 S1_Time12 S1
10 Array10 pool1 S1_Time12 S1
11 Array11 pool1 S1_Time12 S1
12 Array12 S1_Time12 pool1 S1
13 Array13 pool1 S1_Time24 S1
14 Array14 pool1 S1_Time24 S1
15 Array15 pool1 S1_Time24 S1
16 Array16 S1_Time24 pool1 S1
17 Array17 pool1 S1_Time48 S1
18 Array18 pool1 S1_Time48 S1
19 Array19 pool1 S1_Time48 S1
20 Array20 S1_Time48 pool1 S1
21 Array21 pool2 S2_Time0 S2
22 Array22 pool2 S2_Time0 S2
23 Array23 pool2 S2_Time0 S2
24 Array24 S2_Time0 pool2 S2
25 Array25 pool2 S2_Time6 S2
26 Array26 pool2 S2_Time6 S2
27 Array27 pool2 S2_Time6 S2
28 Array28 S2_Time6 pool2 S2
29 Array29 pool2 S2_Time12 S2
30 Array30 pool2 S2_Time12 S2
31 Array31 pool2 S2_Time12 S2
32 Array32 S2_Time12 pool2 S2
33 Array33 pool2 S2_Time24 S2
34 Array34 pool2 S2_Time24 S2
35 Array35 pool2 S2_Time24 S2
36 Array36 S2_Time24 pool2 S2
37 Array37 pool2 S2_Time48 S2
38 Array38 pool2 S2_Time48 S2
39 Array39 pool2 S2_Time48 S2
40 Array40 S2_Time48 pool2 S2
limma microarray R • 1000 views
Entering edit mode

You haven't explained what pool1 and pool2 are. I'm guessing they are all the samples for a given stressor pooled together? Also, you say that your experiment involved 4 subjects per stressor, but you've left out important details. Were the same subjects used for both stressors? Were the same subjects used for all time points? Your wording does not make this clear. You need to fully document all the known variables in your experimental design before you can construct your design matrix.

Entering edit mode

Yes, pool1 & 2  are all the samples pooled from all the time points for the respective stressors. The subjects used for all the time points both stressors are same.

Entering edit mode

Ok, then that is an important feature of the experimental design, and you to add a column with that information to your sample table.

Entering edit mode

Thanks Gordon and Ryan for your valuable inputs.

I have another set of experiments which is pretty much similar to that of the above one. There also I have two different stressors, and similar dye swap but the reference RNA is same for all the arrays (essentially 'pool1' and 'pool2' from the above experiment is a common 'pool' here). I want to know genes common and unique to different stressors. 


Entering edit mode

With a single common reference, you just need one call to modelMatrix. The multiple calls and concatenation with blockDiag were all to handle the multiple reference pools. As for the subject effects, I'm not sure how to incorporate those into a 2-color design.

Entering edit mode
Last seen 32 minutes ago
WEHI, Melbourne, Australia

The fact that two different pools were used means that you almost have two independent experiments. For a study like this, the standard limma method is to make two design matrices and combine them together:

design.S1 <- modelMatrix(targets[ 1:20,], ref="pool1")
design.S2 <- modelMatrix(targets[21:40,], ref="pool2")
design <- blockDiag(design.S1, design.S2)

I would also add an intercept term to allow for probe-specific dye effects, which tend to be important for Agilent arrays:

design <- cbind(Dye=1, design)

This is equivalent to Ryan's code, but a bit simpler.


Entering edit mode
Last seen 23 months ago
Scripps Research, La Jolla, CA

Since your two stressors apparently have different reference samples (pool1 and pool2), I think the most straightforward way to implement this is by making a separate design matrix for each stressor and then combining them into a single block-diagonal matrix using Matrix::bdiag (correction: limma already provides a blockDiag function for exactly this purpose):


# Indicate the reference sample for each row
targets$Ref <- str_replace(targets$Stress, "S", "pool")

# Split by reference and make the design matrix for each split
targets.split <- split(targets, targets$Ref)
designs <- lapply(targets.split, function(tgt) {
    modelMatrix(targets=droplevels(tgt), ref=tgt$Ref[1])

# Combine the separate design matrices into one
design <- as.matrix(, designs))
## Ensure that the design has the rows in the same order as targets
design <- design[rownames(targets),]

## Define contrasts for an ANOVA test of all time points for each
## stressor
stressor1_contrasts <- sprintf("S1_Time%i - S1_Time0", c(6, 12, 24, 48))
stressor2_contrasts <- sprintf("S2_Time%i - S2_Time0", c(6, 12, 24, 48))
## Build the contrast matrix for each ANOVA test
stressor1_cmat <- makeContrasts(contrasts=stressor1_contrasts, levels=design)
stressor2_cmat <- makeContrasts(contrasts=stressor2_contrasts, levels=design)

From there, you can use the usual limma procedure of lmFit,, eBayes, and topTable to get a table of genes for each ANOVA test.

If there are any additional variables to be taken into account, such as a subject effect, you'll need to reply with more information.

Entering edit mode

Or instead of Matrix::bdiag() you could use the blockDiag() function already provided in the limma package.

Entering edit mode

Oh, I wasn't aware that limma provided such a function. Thanks for the correction! I'll edit it into my answer.

Entering edit mode
Last seen 9 hours ago
United States

Woops -- provided a boneheaded answer in haste. Had a momentary lapse of reason regarding the nuts and bolts of two-color arrays.

The details of my original answer have been redacted to mitigate any damage this might cause a wary future traveler ...

Thanks to Ryan for the correction


Entering edit mode

I don't think this design is appropriate for a 2-color array with dye swaps. I don't have a lot of experience with dye swaps, but my understanding is that a 2-color array design typically doesn't have an intercept (since every array represents the fold change between 2 samples), and a dye-swap is implemented by flipping the signs on the corresponding row of the design matrix, not by adding an additional coefficient. This is how limma::modelMatrix does it. (Although you might also add an dye swap coefficient in addition to the sign flip, to control for the possibility that the dyes are not equivalent). This design also doesn't account for the use of two different reference samples.

I also think the design for this experiment needs to have an interaction between time and stressor, since the experimenter is interested in the separate time effects for each stressor.


Login before adding your answer.

Traffic: 263 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6