Search
Question: Limma: Paired samples, multiple groups: problems understanding contrasts and model.matrix
2
16 months ago by
Anna 40
Sweden
Anna 40 wrote:

Hi everyone,

I am trying to sort out how to properly analyze a microarray experiment with multiple groups and "paired" samples and despite my best efforts I am still a bit unsure if I am doing it right.

Here is my set up: I have collected four different cell types from six patients. I want to compare the cell types to each other, but block for "patient effects". Here is an abbreviated list of samples and conditions (samples). My arraydata (from affymetrix) is in an expression set I call exprsset.

<caption>samples</caption>
 Array Fraction Patient A1 CellType1 P1 A2 CellType2 P1 A3 CellType3 P1 A4 CellType4 P1 ... ... ... A24 CellType4 P6

So, combining the description for paired samples and multiple groups from limmas user's guide gives something like

patient<-factor(samples$Patient) fraction <- factor(samples$Fraction, levels=c("CellType1, "CellType2", "CellType3", "CellType4"))
1.
design <- model.matrix(~patient+fraction)
fit <- eBayes(fit)

But from here I don't know how to get at the contrasts I am after (pairwise comparisons of cell types): It seems like R is always contrasting against one sample? If so, how do I get the other comparisons (that does not include this reference sample)  and how does R choose this reference?

I couldn't sort this out so after some googling (especially these posts: Question on unbalanced paired design  and https://support.bioconductor.org/p/55255/),  I instead tried this:

2.
design <- model.matrix(~ 0 + fraction + patient )
contrasts <- makeContrasts(fractionCelltype1 - fractionCellType2, fractionCellType1-fractionCellType3, fractionCellType1 - fractionCelltype4, (all other pairwise comparisons), levels=design)
fit2 <- contrasts.fit(fit, contrasts)
fit2 <- eBayes(fit2)

topTable with the coef parameter then seems to give me the comparisons I am after so:

My main question is: does the nr 2 "script" do what I hope, i.e. that is give me the pairwise celltype comparisons "blocked" for patient effects? If not, how do I correct it?

design <- model.matrix(~ 0 + patient + fraction)

​and use the same contrasts as in 2., I get: Warning message:In contrasts.fit(fit, contrast.matrix) :  row names of contrasts don't match col names of coefficients.

Looking at the design matrix: I have 1 less fraction column that I have fractions, so one of the fractions are "missing", which of course makes a discrepancy with my contrasts, hence the error. The same goes for patients if I use the formula for design under 2. (but since I don't make "patient contrasts" in the makeContrasts(), that doesn't seem to matter...). Naively put: Why does one condition "disappear" in the design matrix? Is it again a reference level of some sort and how do I then handle this?

If someone  could explain this or point me to some appropriate tutorial about design matrices and how they relate to the linear models, I would greatly appreciate it! I really bugs me that I can't seem to get my head around this!

Best regards,

Anna

modified 16 months ago • written 16 months ago by Anna 40
4
16 months ago by
United States
James W. MacDonald47k wrote:

You will probably find it easier to understand if you switch the order of patient and fraction. That way you will have coefficients for all the fractions, and can make all the comparisons you want.

To answer your questions, R selects the first level of a factor as the baseline. You can make any comparison you want with either model, but you need to do more algebra to get all the comparisons you care about. In both models (the way you are specifying them), the fractionCellType1 will be your baseline, and all other treatments are compared to that.

In other words fractionCellType2 is actually fractionCellType2 - fractionCellType1. And fractionCellType3 is fractionCellType3 - fractionCellType1. So testing any of those coefficients is actually testing that the difference between those groups is equal to zero. If you care about say fractionCellType3  - fractionCellType2, you need to make that contrast explicitly (which will be (fractionCellType3 - fractionCellType1) - (fractionCellType2 - fractionCellType1), and the fractionCellType1 sample gets cancelled out).

If you specify the model as I suggest, then you can just make all the comparisons you care about directly, rather than having to remember that your coefficients are already implicit differences.

Have you read the limma User's Guide? That's a good tutorial.

0
16 months ago by
Anna 40
Sweden
Anna 40 wrote:

Thank you, that helped a lot!

I have read the Limma user's guide, it's great! But still, I didn't get these things. After reading your answer and this http://genomicsclass.github.io/book/pages/expressing_design_formula.html I feel a little bit more sure.

Anyways, from the last link I also found the relevel() function which I guess can be used to change the baseline (reference level) if one wants to (to make contrasting more convenient).

/Anna

2

You can use relevel, or you can specify the baseline at the outset. It's numero-alphabetic by default, so if you have say Untreated and Treated samples, the default is

> factor(rep(c("Treated","Untreated"), 4))
[1] Treated   Untreated Treated   Untreated Treated   Untreated Treated
[8] Untreated
Levels: Treated Untreated

And  you can switch that at the outset by adding the levels argument.

> factor(rep(c("Treated","Untreated"), 4), levels = c("Untreated","Treated"))
[1] Treated   Untreated Treated   Untreated Treated   Untreated Treated
[8] Untreated
Levels: Untreated Treated