Limma Model Design Question
Entering edit mode
Last seen 28 days ago
United Kingdom


I have an experiment that contains 3 treatments and a technical effect consisting of three batches. 

treatment <- c("a","a","a","b","b","b","c","c","c")

batch <-c("i","ii","iii","i","ii","iii","i","ii","iii")

I realise that looks like a paired sample design, it's just for simplicity. 

design <- model.matrix(~0 + as.factor(treatment) + as.factor(batch))

When I run the above code, I get a design matrix with 5 column names, all three of my treatments, then only two of my batches. I'm not sure why this is happening, I assumed that because I've added zero at the beginning of my model design, that stops terms from being absorbed into the intercept, can anyone shed any light on why I'm only seeing 5 terms? 

I realise I could add in the batch as an "extra" variable using:

design <- model.matrix(~0 + as.factor(treatment), as.factor(batch))

but ultimately, I'd like to include multiple variables that could account for technical variation. 


limma design matrix limma • 1.7k views
Entering edit mode
Last seen 1 day ago
United States

The other batch is absorbed in to the treatments. You can think of the coefficients in your model as 'adjustments' that make things comparable. The treatments are the means of each treatment for batch 'i', and the batch coefficients are the difference between say, batch ii and batch i, which then adjusts the data for batch ii (by subtracting out the mean difference between batch ii and batch i) so that you can compare data between batches.

Algebraically, for a given gene in batch i, you are saying

Gene expression = mean(gene expression, batch i) + error

and for the same gene in batch ii you are saying

Gene expression = mean(gene expression, batch i) + mean(batch ii - batch i) + error

Where you capture (and remove) the mean difference between the two batches in the second coefficient, to allow for inter-batch comparisons. The same thing is true for batch iii.

Edit: This doesn't necessarily allow for inter-batch comparisons. Instead, it is 'controlling for the batch effect', thus allowing you to make comparisons between treatments without having to worry that the batch effect is biasing your results.

Entering edit mode

Indeed, if your design were to have all three batch terms and all three treatments, it would not be of full rank:

> <- cbind(model.matrix(~0+treatment), model.matrix(~0+batch))
> colnames(
[1] "treatmenta" "treatmentb" "treatmentc" "batchi"     "batchii"   
[6] "batchiii"  
> qr($rank
[1] 5

There are linear dependencies between your columns in this design matrix. More specifically, the row sums of the first 3 columns in gives the same vector (all-ones) as the row sums of the last 3 columns. This means that you'd have infinite solutions for the fitted value vector; you could just increase the estimates of the first 3 coefficients by any value, so long as you decrease the estimates of the last three by the same value. Obviously, this is not ideal, as we want one solution.

Entering edit mode

James, Aaron, thanks for your explanations. That helps clear things up. So if I were to make contrasts and do an lm fit, as per below, that should look at 'a' against 'b', but control for the batch effect?

design   <- model.matrix(~0 + as.factor(treatment) + as.factor(batch))
fit      <- lmFit(normalised_data, design)
cont_mat <- makeContrasts(a-b)
fit2     <-, contrasts=cont_mat)
fit2     <- eBayes(fit2)
Entering edit mode

That is correct.

Entering edit mode

The gist of your code is correct. However, you should set levels=design in the call to makeContrasts, and rename the first three columns of design to c('a','b','c').

Entering edit mode

Already got that in there, but thanks for the clarification :) 


Login before adding your answer.

Traffic: 454 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6