How to construct contrasts for a linear model with interactions?
2
0
Entering edit mode
@vedranfranke-7218
Last seen 3.4 years ago
Germany

I have trouble figuring out how to set up contrasts for a linear model with interaction terms.

In the following example, how do I construct contrasts so that I test for the differences of species A - tissue 1 Vs the mean of everything else (rows 1 and 2 from the matrix should be the topmost ranked)?

I know it can be done through the design matrix, but am wondering whether it is possible to do it using a contrast matrix.

library(limma)
set.seed(10)

d = matrix(rnorm(80, 3), ncol=8)
n=100
d[1:2,1:2] = d[1:2,1:2] + n/2
d[3:4,3:4] = d[3:4,3:4] + n/2
d[3:4,5:6] = d[3:4,5:6] + n
d[7:8,7:8] = d[7:8,7:8] + n
d[9:10,1:4] = d[9:10,1:4] + n
d = log2(d)
dat = data.frame(species=rep(c('A','B'), each=4),
tissue=rep(rep(c('T1','T2'),each=2), times=2))
colnames(d) = with(dat, paste(species, tissue, sep='.'))
design = model.matrix(~0 + species + tissue + species:tissue , data = dat)
colnames(design) = sub(':','.',colnames(design))

fit = lmFit(d, design)

cont.mat = ?

fit2 = fit = eBayes(contrasts.fit(fit,cont.mat))

res=topTable(fit2, sort.by="p")

This question is a crosspost from other stacks, but nobody has still answered.

Tnx!

Edit: after a note, I added log2(d)

limma contrast linear model • 1.1k views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 18 minutes ago
The city by the bay

Well, this boils down to simple arithmetic. Consider this:

A1.T1 <- design[1,] # first replicate of A1.T1
everything.else <- colMeans(design[c(3,5,7),]) # first replicate of others
cont.mat <- A1.T1 - everything.else

Obviously, this would be so much easier with a one-way layout.

As an aside, in your matrix, rows 1 and 2 will not be top-ranked for your specified contrast. Rather, rows 9 and 10 will be top ranked. This is because the values in d are treated as log-expression values, such that we're looking at differences rather than fold changes between them. In rows 1 and 2, the difference between A1.T1 and the mean of everything else is n/2; in rows 9 and 10, the difference is between n+3 and (n/3)+3, which is 2n/3. This is obviously larger than n/2.

0
Entering edit mode

Dear Aaron,

Thank you so much for your explanation!

I added the log2 transformation into the data.

0
Entering edit mode

Dear Aaron,

Thank you so much for your explanation!

I added the log2 transformation into the data.

0
Entering edit mode

You also need to run eBayes on fit2 after running contrasts.fit.

2
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

You are making things needlessly complex. Rather than setting up a two-way ANOVA with interaction, it's easier to just set up a one-way model and construct whatever contrasts you care about.

> grps <- factor(apply(dat, 1, paste, collapse = "_"))
> design <- model.matrix(~0+grps)
> colnames(design) <- gsub("grps", "", colnames(design))
> design
A_T1 A_T2 B_T1 B_T2
1    1    0    0    0
2    1    0    0    0
3    0    1    0    0
4    0    1    0    0
5    0    0    1    0
6    0    0    1    0
7    0    0    0    1
8    0    0    0    1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")\$grps
[1] "contr.treatment"

> contrast <- matrix(c(1,-1/3,-1/3,-1/3), ncol = 1)
> contrast
[,1]
[1,]  1.0000000
[2,] -0.3333333
[3,] -0.3333333
[4,] -0.3333333
>

0
Entering edit mode

Dear James,

Thank you so much for the prompt answer. I am asking because the abovementioned apporach does not work in limma; as explained here:

When you try to execute it you get rows 9 and 10 as the topmost result and not rows 1 and 2.

0
Entering edit mode

Getting 9 and 10 as the top-ranked rows is a mathematical inevitability for the contrast that you've requested. If you don't want it, you should switch to a different comparison strategy, e.g., compare A1.T1 to each other group and only keep genes that have significant differences in all of these comparisons. This will favour rows 1 and 2 instead.

0
Entering edit mode

If I calculate it by hand I get the following result:

(rowMeans(log2(d[1:2,c(1:2)]))) - (rowMeans(log2(d[1:2,-c(1:2)])))
[1] 4.501986 4.764797
(rowMeans(log2(d[9:10,c(1:2)]))) - (rowMeans(log2(d[9:10,-c(1:2)])))
[1] 3.579294 3.365360


Sorry for bugging, but could you please explain to me why I should get rows 9 and 10 as the most significant result?

0
Entering edit mode

As I mentioned in my original post, limma assumes that your values are already log-transformed. If you don't log-transform d before you feed it into limma, then the results won't match up to what you expect.