Question

Limma Model Design Question

0

Entering edit mode

andrew.j.skelton73 ▴ 370

@andrewjskelton73-7074

Last seen 8 months ago

United Kingdom

Hi,

I have an experiment that contains 3 treatments and a technical effect consisting of three batches.

treatment <- c("a","a","a","b","b","b","c","c","c")

batch <-c("i","ii","iii","i","ii","iii","i","ii","iii")

I realise that looks like a paired sample design, it's just for simplicity.

design <- model.matrix(~0 + as.factor(treatment) + as.factor(batch))

When I run the above code, I get a design matrix with 5 column names, all three of my treatments, then only two of my batches. I'm not sure why this is happening, I assumed that because I've added zero at the beginning of my model design, that stops terms from being absorbed into the intercept, can anyone shed any light on why I'm only seeing 5 terms?

I realise I could add in the batch as an "extra" variable using:

design <- model.matrix(~0 + as.factor(treatment), as.factor(batch))

but ultimately, I'd like to include multiple variables that could account for technical variation.

Cheers

limma design matrix limma • 1.9k views

ADD COMMENT • link updated 9.4 years ago by James W. MacDonald 67k • written 9.4 years ago by andrew.j.skelton73 ▴ 370

score 6 · Accepted Answer · 2015-07-01

6

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 day ago

United States

The other batch is absorbed in to the treatments. You can think of the coefficients in your model as 'adjustments' that make things comparable. The treatments are the means of each treatment for batch 'i', and the batch coefficients are the difference between say, batch ii and batch i, which then adjusts the data for batch ii (by subtracting out the mean difference between batch ii and batch i) so that you can compare data between batches.

Algebraically, for a given gene in batch i, you are saying

Gene expression = mean(gene expression, batch i) + error

and for the same gene in batch ii you are saying

Gene expression = mean(gene expression, batch i) + mean(batch ii - batch i) + error

Where you capture (and remove) the mean difference between the two batches in the second coefficient, to allow for inter-batch comparisons. The same thing is true for batch iii.

Edit: This doesn't necessarily allow for inter-batch comparisons. Instead, it is 'controlling for the batch effect', thus allowing you to make comparisons between treatments without having to worry that the batch effect is biasing your results.

ADD COMMENT • link 9.4 years ago James W. MacDonald 67k

1

Entering edit mode

Indeed, if your design were to have all three batch terms and all three treatments, it would not be of full rank:

> bad.design <- cbind(model.matrix(~0+treatment), model.matrix(~0+batch))
> colnames(bad.design)
[1] "treatmenta" "treatmentb" "treatmentc" "batchi"     "batchii"   
[6] "batchiii"  
> qr(bad.design)$rank
[1] 5

There are linear dependencies between your columns in this design matrix. More specifically, the row sums of the first 3 columns in bad.design gives the same vector (all-ones) as the row sums of the last 3 columns. This means that you'd have infinite solutions for the fitted value vector; you could just increase the estimates of the first 3 coefficients by any value, so long as you decrease the estimates of the last three by the same value. Obviously, this is not ideal, as we want one solution.

ADD REPLY • link 9.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

James, Aaron, thanks for your explanations. That helps clear things up. So if I were to make contrasts and do an lm fit, as per below, that should look at 'a' against 'b', but control for the batch effect?

design   <- model.matrix(~0 + as.factor(treatment) + as.factor(batch))
fit      <- lmFit(normalised_data, design)
cont_mat <- makeContrasts(a-b)
fit2     <- contrasts.fit(fit, contrasts=cont_mat)
fit2     <- eBayes(fit2)

ADD REPLY • link 9.4 years ago andrew.j.skelton73 ▴ 370

1

Entering edit mode

That is correct.

ADD REPLY • link 9.4 years ago James W. MacDonald 67k

0

Entering edit mode

The gist of your code is correct. However, you should set levels=design in the call to makeContrasts, and rename the first three columns of design to c('a','b','c').