Creating design matrix in LIMMA
3
0
Entering edit mode
@sgblackpearl-12124
Last seen 5.6 years ago

Hi everyone, I am stuck with creating design matrix in LIMMA for my microarray experimental design. The experiment consists of 40 Agilent arrays. The subjects were treated with two stress conditions (S1 and S2). In each stress condition, the subjects were sampled in 5 timepoints (0, 6, 12, 24 and 48). Microarray experiment was performed in such a way that every fourth array of a particular timepoint, irrespective of stress treatment, is dye swapped. Basically, it is a time course experiment on the same subject with two different stresses. I want to to know which are the genes important/responsible for stress 1 or 2. The target file looks like below:

 SampleNumber FileName Cy3 Cy5 Stress 1 Array1 pool1 S1_Time0 S1 2 Array2 pool1 S1_Time0 S1 3 Array3 pool1 S1_Time0 S1 4 Array4 S1_Time0 pool1 S1 5 Array5 pool1 S1_Time6 S1 6 Array6 pool1 S1_Time6 S1 7 Array7 pool1 S1_Time6 S1 8 Array8 S1_Time6 pool1 S1 9 Array9 pool1 S1_Time12 S1 10 Array10 pool1 S1_Time12 S1 11 Array11 pool1 S1_Time12 S1 12 Array12 S1_Time12 pool1 S1 13 Array13 pool1 S1_Time24 S1 14 Array14 pool1 S1_Time24 S1 15 Array15 pool1 S1_Time24 S1 16 Array16 S1_Time24 pool1 S1 17 Array17 pool1 S1_Time48 S1 18 Array18 pool1 S1_Time48 S1 19 Array19 pool1 S1_Time48 S1 20 Array20 S1_Time48 pool1 S1 21 Array21 pool2 S2_Time0 S2 22 Array22 pool2 S2_Time0 S2 23 Array23 pool2 S2_Time0 S2 24 Array24 S2_Time0 pool2 S2 25 Array25 pool2 S2_Time6 S2 26 Array26 pool2 S2_Time6 S2 27 Array27 pool2 S2_Time6 S2 28 Array28 S2_Time6 pool2 S2 29 Array29 pool2 S2_Time12 S2 30 Array30 pool2 S2_Time12 S2 31 Array31 pool2 S2_Time12 S2 32 Array32 S2_Time12 pool2 S2 33 Array33 pool2 S2_Time24 S2 34 Array34 pool2 S2_Time24 S2 35 Array35 pool2 S2_Time24 S2 36 Array36 S2_Time24 pool2 S2 37 Array37 pool2 S2_Time48 S2 38 Array38 pool2 S2_Time48 S2 39 Array39 pool2 S2_Time48 S2 40 Array40 S2_Time48 pool2 S2
limma microarray R • 1000 views
1
Entering edit mode

You haven't explained what pool1 and pool2 are. I'm guessing they are all the samples for a given stressor pooled together? Also, you say that your experiment involved 4 subjects per stressor, but you've left out important details. Were the same subjects used for both stressors? Were the same subjects used for all time points? Your wording does not make this clear. You need to fully document all the known variables in your experimental design before you can construct your design matrix.

0
Entering edit mode

Yes, pool1 & 2  are all the samples pooled from all the time points for the respective stressors. The subjects used for all the time points both stressors are same.

1
Entering edit mode

Ok, then that is an important feature of the experimental design, and you to add a column with that information to your sample table.

0
Entering edit mode

Thanks Gordon and Ryan for your valuable inputs.

I have another set of experiments which is pretty much similar to that of the above one. There also I have two different stressors, and similar dye swap but the reference RNA is same for all the arrays (essentially 'pool1' and 'pool2' from the above experiment is a common 'pool' here). I want to know genes common and unique to different stressors.

1
Entering edit mode

With a single common reference, you just need one call to modelMatrix. The multiple calls and concatenation with blockDiag were all to handle the multiple reference pools. As for the subject effects, I'm not sure how to incorporate those into a 2-color design.

2
Entering edit mode
@gordon-smyth
Last seen 32 minutes ago
WEHI, Melbourne, Australia

The fact that two different pools were used means that you almost have two independent experiments. For a study like this, the standard limma method is to make two design matrices and combine them together:

design.S1 <- modelMatrix(targets[ 1:20,], ref="pool1")
design.S2 <- modelMatrix(targets[21:40,], ref="pool2")
design <- blockDiag(design.S1, design.S2)


I would also add an intercept term to allow for probe-specific dye effects, which tend to be important for Agilent arrays:

design <- cbind(Dye=1, design)

This is equivalent to Ryan's code, but a bit simpler.

1
Entering edit mode
@ryan-c-thompson-5618
Last seen 23 months ago
Scripps Research, La Jolla, CA

Since your two stressors apparently have different reference samples (pool1 and pool2), I think the most straightforward way to implement this is by making a separate design matrix for each stressor and then combining them into a single block-diagonal matrix using Matrix::bdiag (correction: limma already provides a blockDiag function for exactly this purpose):

library(limma)
library(stringr)

# Indicate the reference sample for each row
targets$Ref <- str_replace(targets$Stress, "S", "pool")

# Split by reference and make the design matrix for each split
targets.split <- split(targets, targets$Ref) designs <- lapply(targets.split, function(tgt) { modelMatrix(targets=droplevels(tgt), ref=tgt$Ref[1])
})

# Combine the separate design matrices into one
design <- as.matrix(do.call(blockDiag, designs))
## Ensure that the design has the rows in the same order as targets
design <- design[rownames(targets),]

## Define contrasts for an ANOVA test of all time points for each
## stressor
stressor1_contrasts <- sprintf("S1_Time%i - S1_Time0", c(6, 12, 24, 48))
stressor2_contrasts <- sprintf("S2_Time%i - S2_Time0", c(6, 12, 24, 48))
## Build the contrast matrix for each ANOVA test
stressor1_cmat <- makeContrasts(contrasts=stressor1_contrasts, levels=design)
stressor2_cmat <- makeContrasts(contrasts=stressor2_contrasts, levels=design)



From there, you can use the usual limma procedure of lmFit, contrasts.fit, eBayes, and topTable to get a table of genes for each ANOVA test.

If there are any additional variables to be taken into account, such as a subject effect, you'll need to reply with more information.

1
Entering edit mode

Or instead of Matrix::bdiag() you could use the blockDiag() function already provided in the limma package.

1
Entering edit mode

Oh, I wasn't aware that limma provided such a function. Thanks for the correction! I'll edit it into my answer.

0
Entering edit mode
@steve-lianoglou-2771
Last seen 9 hours ago
United States

Woops -- provided a boneheaded answer in haste. Had a momentary lapse of reason regarding the nuts and bolts of two-color arrays.

The details of my original answer have been redacted to mitigate any damage this might cause a wary future traveler ...

Thanks to Ryan for the correction

0
Entering edit mode

I don't think this design is appropriate for a 2-color array with dye swaps. I don't have a lot of experience with dye swaps, but my understanding is that a 2-color array design typically doesn't have an intercept (since every array represents the fold change between 2 samples), and a dye-swap is implemented by flipping the signs on the corresponding row of the design matrix, not by adding an additional coefficient. This is how limma::modelMatrix does it. (Although you might also add an dye swap coefficient in addition to the sign flip, to control for the possibility that the dyes are not equivalent). This design also doesn't account for the use of two different reference samples.

I also think the design for this experiment needs to have an interaction between time and stressor, since the experimenter is interested in the separate time effects for each stressor.