Limma: Paired samples, multiple groups: problems understanding contrasts and model.matrix
2
3
Entering edit mode
Anna ▴ 50
@anna-5560
Last seen 7.6 years ago
Sweden

Hi everyone,

I am trying to sort out how to properly analyze a microarray experiment with multiple groups and "paired" samples and despite my best efforts I am still a bit unsure if I am doing it right.

Here is my set up: I have collected four different cell types from six patients. I want to compare the cell types to each other, but block for "patient effects". Here is an abbreviated list of samples and conditions (samples). My arraydata (from affymetrix) is in an expression set I call exprsset.

<caption>samples</caption>
Array Fraction Patient
A1 CellType1 P1
A2 CellType2 P1
A3 CellType3 P1
A4 CellType4 P1
... ... ...
A24 CellType4 P6

So, combining the description for paired samples and multiple groups from limmas user's guide gives something like

patient<-factor(samples$Patient)
fraction <- factor(samples$Fraction, levels=c("CellType1, "CellType2", "CellType3", "CellType4"))
1.
design <- model.matrix(~patient+fraction)
fit <- lmfit(exprsset, design)
fit <- eBayes(fit)

But from here I don't know how to get at the contrasts I am after (pairwise comparisons of cell types): It seems like R is always contrasting against one sample? If so, how do I get the other comparisons (that does not include this reference sample)  and how does R choose this reference?

I couldn't sort this out so after some googling (especially these posts: Question on unbalanced paired design  and https://support.bioconductor.org/p/55255/),  I instead tried this:

2.
design <- model.matrix(~ 0 + fraction + patient )
fit <- lmfit(exprsset, design)
contrasts <- makeContrasts(fractionCelltype1 - fractionCellType2, fractionCellType1-fractionCellType3, fractionCellType1 - fractionCelltype4, (all other pairwise comparisons), levels=design)
fit2 <- contrasts.fit(fit, contrasts)
fit2 <- eBayes(fit2)

topTable with the coef parameter then seems to give me the comparisons I am after so: 

My main question is: does the nr 2 "script" do what I hope, i.e. that is give me the pairwise celltype comparisons "blocked" for patient effects? If not, how do I correct it?

What I still don't understand here is, if I in the design instead write:

design <- model.matrix(~ 0 + patient + fraction) 

​and use the same contrasts as in 2., I get: Warning message:In contrasts.fit(fit, contrast.matrix) :  row names of contrasts don't match col names of coefficients. 

Looking at the design matrix: I have 1 less fraction column that I have fractions, so one of the fractions are "missing", which of course makes a discrepancy with my contrasts, hence the error. The same goes for patients if I use the formula for design under 2. (but since I don't make "patient contrasts" in the makeContrasts(), that doesn't seem to matter...). Naively put: Why does one condition "disappear" in the design matrix? Is it again a reference level of some sort and how do I then handle this?

If someone  could explain this or point me to some appropriate tutorial about design matrices and how they relate to the linear models, I would greatly appreciate it! I really bugs me that I can't seem to get my head around this! 

Best regards,

Anna

microarray limma design and contrast matrix • 6.8k views
ADD COMMENT
4
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You will probably find it easier to understand if you switch the order of patient and fraction. That way you will have coefficients for all the fractions, and can make all the comparisons you want.

To answer your questions, R selects the first level of a factor as the baseline. You can make any comparison you want with either model, but you need to do more algebra to get all the comparisons you care about. In both models (the way you are specifying them), the fractionCellType1 will be your baseline, and all other treatments are compared to that.

In other words fractionCellType2 is actually fractionCellType2 - fractionCellType1. And fractionCellType3 is fractionCellType3 - fractionCellType1. So testing any of those coefficients is actually testing that the difference between those groups is equal to zero. If you care about say fractionCellType3  - fractionCellType2, you need to make that contrast explicitly (which will be (fractionCellType3 - fractionCellType1) - (fractionCellType2 - fractionCellType1), and the fractionCellType1 sample gets cancelled out).

If you specify the model as I suggest, then you can just make all the comparisons you care about directly, rather than having to remember that your coefficients are already implicit differences.

Have you read the limma User's Guide? That's a good tutorial.

ADD COMMENT
0
Entering edit mode
Anna ▴ 50
@anna-5560
Last seen 7.6 years ago
Sweden

Thank you, that helped a lot!

I have read the Limma user's guide, it's great! But still, I didn't get these things. After reading your answer and this http://genomicsclass.github.io/book/pages/expressing_design_formula.html I feel a little bit more sure.

Anyways, from the last link I also found the relevel() function which I guess can be used to change the baseline (reference level) if one wants to (to make contrasting more convenient).

/Anna

 

ADD COMMENT
2
Entering edit mode

You can use relevel, or you can specify the baseline at the outset. It's numero-alphabetic by default, so if you have say Untreated and Treated samples, the default is

> factor(rep(c("Treated","Untreated"), 4))
[1] Treated   Untreated Treated   Untreated Treated   Untreated Treated  
[8] Untreated
Levels: Treated Untreated

And  you can switch that at the outset by adding the levels argument.

> factor(rep(c("Treated","Untreated"), 4), levels = c("Untreated","Treated"))
[1] Treated   Untreated Treated   Untreated Treated   Untreated Treated  
[8] Untreated
Levels: Untreated Treated

Also, when replying to an answer, don't use the 'Add your answer' box below. That's for adding answers. Instead, click the ADD COMMENT (or ADD REPLY) buttons and use the box that pops up.

ADD REPLY
0
Entering edit mode

Thanks again! Point taken about answering/commenting. Will remember that in the future (as I can't seem to edit now).

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6