Hi everyone,
I am trying to sort out how to properly analyze a microarray experiment with multiple groups and "paired" samples and despite my best efforts I am still a bit unsure if I am doing it right.
Here is my set up: I have collected four different cell types from six patients. I want to compare the cell types to each other, but block for "patient effects". Here is an abbreviated list of samples and conditions (samples). My arraydata (from affymetrix) is in an expression set I call exprsset.
<caption>samples</caption>Array | Fraction | Patient |
A1 | CellType1 | P1 |
A2 | CellType2 | P1 |
A3 | CellType3 | P1 |
A4 | CellType4 | P1 |
... | ... | ... |
A24 | CellType4 | P6 |
So, combining the description for paired samples and multiple groups from limmas user's guide gives something like
patient<-factor(samples$Patient) fraction <- factor(samples$Fraction, levels=c("CellType1, "CellType2", "CellType3", "CellType4"))
1. design <- model.matrix(~patient+fraction) fit <- lmfit(exprsset, design) fit <- eBayes(fit)
But from here I don't know how to get at the contrasts I am after (pairwise comparisons of cell types): It seems like R is always contrasting against one sample? If so, how do I get the other comparisons (that does not include this reference sample) and how does R choose this reference?
I couldn't sort this out so after some googling (especially these posts: Question on unbalanced paired design and https://support.bioconductor.org/p/55255/), I instead tried this:
2. design <- model.matrix(~ 0 + fraction + patient ) fit <- lmfit(exprsset, design) contrasts <- makeContrasts(fractionCelltype1 - fractionCellType2, fractionCellType1-fractionCellType3, fractionCellType1 - fractionCelltype4, (all other pairwise comparisons), levels=design) fit2 <- contrasts.fit(fit, contrasts) fit2 <- eBayes(fit2)
topTable with the coef parameter then seems to give me the comparisons I am after so:
My main question is: does the nr 2 "script" do what I hope, i.e. that is give me the pairwise celltype comparisons "blocked" for patient effects? If not, how do I correct it?
What I still don't understand here is, if I in the design instead write:
design <- model.matrix(~ 0 + patient + fraction)
and use the same contrasts as in 2., I get: Warning message:In contrasts.fit(fit, contrast.matrix) : row names of contrasts don't match col names of coefficients.
Looking at the design matrix: I have 1 less fraction column that I have fractions, so one of the fractions are "missing", which of course makes a discrepancy with my contrasts, hence the error. The same goes for patients if I use the formula for design under 2. (but since I don't make "patient contrasts" in the makeContrasts(), that doesn't seem to matter...). Naively put: Why does one condition "disappear" in the design matrix? Is it again a reference level of some sort and how do I then handle this?
If someone could explain this or point me to some appropriate tutorial about design matrices and how they relate to the linear models, I would greatly appreciate it! I really bugs me that I can't seem to get my head around this!
Best regards,
Anna
You can use relevel, or you can specify the baseline at the outset. It's numero-alphabetic by default, so if you have say Untreated and Treated samples, the default is
And you can switch that at the outset by adding the levels argument.
Also, when replying to an answer, don't use the 'Add your answer' box below. That's for adding answers. Instead, click the ADD COMMENT (or ADD REPLY) buttons and use the box that pops up.
Thanks again! Point taken about answering/commenting. Will remember that in the future (as I can't seem to edit now).