How to construct contrasts for a linear model with interactions?
2
0
Entering edit mode
@vedranfranke-7218
Last seen 3.4 years ago
Germany

I have trouble figuring out how to set up contrasts for a linear model with interaction terms.

In the following example, how do I construct contrasts so that I test for the differences of species A - tissue 1 Vs the mean of everything else (rows 1 and 2 from the matrix should be the topmost ranked)?

I know it can be done through the design matrix, but am wondering whether it is possible to do it using a contrast matrix.

library(limma)
set.seed(10)

d = matrix(rnorm(80, 3), ncol=8)
n=100
d[1:2,1:2] = d[1:2,1:2] + n/2
d[3:4,3:4] = d[3:4,3:4] + n/2
d[3:4,5:6] = d[3:4,5:6] + n
d[7:8,7:8] = d[7:8,7:8] + n
d[9:10,1:4] = d[9:10,1:4] + n
d = log2(d)
dat = data.frame(species=rep(c('A','B'), each=4),
            tissue=rep(rep(c('T1','T2'),each=2), times=2))
colnames(d) = with(dat, paste(species, tissue, sep='.'))
design = model.matrix(~0 + species + tissue + species:tissue , data = dat)
colnames(design) = sub(':','.',colnames(design))

fit = lmFit(d, design)

cont.mat = ?

fit2 = fit = eBayes(contrasts.fit(fit,cont.mat))

res=topTable(fit2, sort.by="p")

This question is a crosspost from other stacks, but nobody has still answered.

Tnx!

 

Edit: after a note, I added log2(d)

limma contrast linear model • 1.1k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 18 minutes ago
The city by the bay

Well, this boils down to simple arithmetic. Consider this:

A1.T1 <- design[1,] # first replicate of A1.T1
everything.else <- colMeans(design[c(3,5,7),]) # first replicate of others
cont.mat <- A1.T1 - everything.else

Obviously, this would be so much easier with a one-way layout.

As an aside, in your matrix, rows 1 and 2 will not be top-ranked for your specified contrast. Rather, rows 9 and 10 will be top ranked. This is because the values in d are treated as log-expression values, such that we're looking at differences rather than fold changes between them. In rows 1 and 2, the difference between A1.T1 and the mean of everything else is n/2; in rows 9 and 10, the difference is between n+3 and (n/3)+3, which is 2n/3. This is obviously larger than n/2.

ADD COMMENT
0
Entering edit mode

Dear Aaron, 

Thank you so much for your explanation!

I added the log2 transformation into the data. 

ADD REPLY
0
Entering edit mode

Dear Aaron, 

Thank you so much for your explanation!

I added the log2 transformation into the data. 

ADD REPLY
0
Entering edit mode

You also need to run eBayes on fit2 after running contrasts.fit.

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

You are making things needlessly complex. Rather than setting up a two-way ANOVA with interaction, it's easier to just set up a one-way model and construct whatever contrasts you care about.

> grps <- factor(apply(dat, 1, paste, collapse = "_"))
> design <- model.matrix(~0+grps)
> colnames(design) <- gsub("grps", "", colnames(design))
> design
  A_T1 A_T2 B_T1 B_T2
1    1    0    0    0
2    1    0    0    0
3    0    1    0    0
4    0    1    0    0
5    0    0    1    0
6    0    0    1    0
7    0    0    0    1
8    0    0    0    1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$grps
[1] "contr.treatment"

> contrast <- matrix(c(1,-1/3,-1/3,-1/3), ncol = 1)
> contrast
           [,1]
[1,]  1.0000000
[2,] -0.3333333
[3,] -0.3333333
[4,] -0.3333333
>

 

ADD COMMENT
0
Entering edit mode

Dear James, 

Thank you so much for the prompt answer. I am asking because the abovementioned apporach does not work in limma; as explained here:

Limma: Contrasts comparing one factor to multiple others
http://grokbase.com/t/r/bioconductor/131qjfkt81/bioc-how-to-pool-subgroups-for-makecontrasts-and-subsequent-limma-analysis

When you try to execute it you get rows 9 and 10 as the topmost result and not rows 1 and 2.

 

ADD REPLY
0
Entering edit mode

Getting 9 and 10 as the top-ranked rows is a mathematical inevitability for the contrast that you've requested. If you don't want it, you should switch to a different comparison strategy, e.g., compare A1.T1 to each other group and only keep genes that have significant differences in all of these comparisons. This will favour rows 1 and 2 instead.

ADD REPLY
0
Entering edit mode

If I calculate it by hand I get the following result:

(rowMeans(log2(d[1:2,c(1:2)]))) - (rowMeans(log2(d[1:2,-c(1:2)])))
[1] 4.501986 4.764797
(rowMeans(log2(d[9:10,c(1:2)]))) - (rowMeans(log2(d[9:10,-c(1:2)])))
[1] 3.579294 3.365360

Sorry for bugging, but could you please explain to me why I should get rows 9 and 10 as the most significant result?

ADD REPLY
0
Entering edit mode

As I mentioned in my original post, limma assumes that your values are already log-transformed. If you don't log-transform d before you feed it into limma, then the results won't match up to what you expect.

ADD REPLY

Login before adding your answer.

Traffic: 245 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6