Question

Extract specific contrast from paired multifactor design matrix

0

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 11 weeks ago

Italy

Dear All!

I have the following setup for an RNA-seq experiment: 4 individuals, 2 of them have a disease, 2 are healthy controls, cells from each individual were cultured using two different conditions. So, the setup is very similar to the one described in the edgeR user guide (Section 3.5) or in the DESeq2 vignette (Section 3.12.1). Following the edgeR user guide I generated the design matrix (CTRL are the healthy controls, DIS the patients, condition A and B code for the two different cell culture conditions):

colD <- data.frame(type=rep(c("DIS", "CTRL"), each=4),
                   condition=rep(c("A", "B"), 4),
                   individual=rep(1:4, each=2),
                   stringsAsFactors=TRUE)
## add the nested individual following the edgeR user guide
colD <- cbind(colD, nind=factor(rep(rep(1:2, each=2), 2)))
colD

  type condition individual nind
1  DIS         A          1    1
2  DIS         B          1    1
3  DIS         A          2    2
4  DIS         B          2    2
5 CTRL         A          3    1
6 CTRL         B          3    1
7 CTRL         A          4    2
8 CTRL         B          4    2
## the model matrix for the experiment
model.matrix(~type + type:nind + type:condition, colD)

  (Intercept) typeDIS typeCTRL:nind2 typeDIS:nind2 typeCTRL:conditionB
1           1       1              0             0                   0
2           1       1              0             0                   0
3           1       1              0             1                   0
4           1       1              0             1                   0
5           1       0              0             0                   0
6           1       0              0             0                   1
7           1       0              1             0                   0
8           1       0              1             0                   1
  typeDIS:conditionB
1                  0
2                  1
3                  0
4                  1
5                  0
6                  0
7                  0
8                  0
attr(,"assign")
[1] 0 1 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$type
[1] "contr.treatment"

attr(,"contrasts")$nind
[1] "contr.treatment"

attr(,"contrasts")$condition
[1] "contr.treatment"

The questions I want to answer in that RNA-Seq experiment are:

1) genes differentially expressed between DIS and CTRL, both cultured at condition A.

2) genes differentially expressed in DIS between B and A.

3) genes differentially expressed in CTRL between B and A.

4) genes differentially regulated between (DIS, B vs A) and (CTRL, B vs A).

Now, to answer question 2) I can extract coefficient "typeDIS:conditionB", to answer 3 "typeCTRL:conditionB" and the contrast between both I can extract with results(..., contrast=c(0, 0, 0, 0, -1, 1)). It is however not clear to me how to answer question 1), i.e. the difference between DIS cells and condition A and CTRL cells and condition A. I first thought that I can use "typeDIS" but to me that looks rather like a comparison between all DIS and all CTRL samples (so conditions A and B pooled).

How can I answer question 1) using the design above?

thanks in advance for any comments and answers!

jo

edger deseq2 rna-seq • 1.8k views

ADD COMMENT • link updated 9.5 years ago by James W. MacDonald 67k • written 9.5 years ago by Johannes Rainer ★ 2.1k

score 3 · Accepted Answer · 2015-07-14

3

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

No, typeDIS is estimating the difference between Disease and Control in condition A.

You can see this by noting that the columns of your design are the coefficients, and the rows represent your samples. The fifth row (which corresponds to Control, condition A) has only one 1, in the intercept column. Since this is a Control, condition A sample, this implies that the intercept is estimating the mean of the Control, condition A samples.

The first row of your design has just two 1s, one for the intercept (which we already decided was the mean of the Control, condition A samples), and one for typeDIS. Since this sample is a Disease, condition A sample, typeDIS HAS to be Disease, conditionA - Control, condition A. This is just algebra:

Dis/A = Ctrl/A + (something) => (something) = Dis/A - Ctrl/A

Does that make sense?

ADD COMMENT • link 9.5 years ago James W. MacDonald 67k

1

Entering edit mode

There's also an additional subtlety of the individual effect, which confounds the interpretation of the typeDIS term. Consider an example where both samples of individual 1 are highly expressed, while both samples of individual 2 are lowly expressed. Assume that the average expression of individuals 1 and 2 is equal to the average expression of individuals 3 and 4, i.e., no disease effect for condition A (or B). For simplicity, assume that both samples for the same individual have the same expression. Upon fitting, one will find that typeDIS is non-zero and positive while typeDIS:nind2 is large and negative (all other non-intercept coefficients are zero). For example, if we assign the matrix in the original post to design, we can get:

> counts <- c(200, 200, 0, 0, 100, 100, 100, 100)
> glmFit(rbind(counts), design, offset=0, dispersion=0.05)$coef
       (Intercept)   typeDIS typeCTRL:nind2 typeDIS:nind2 typeCTRL:conditionB
counts    4.383276 0.6925228     1.0527e-16     -7.378383          1.0527e-16
       typeDIS:conditionB
counts       -3.41739e-16

Treating typeDIS as the disease effect for condition A would suggest a difference where there is none. Indeed, the example in Section 3.5 in the user's guide never compares samples directly between disease states, but instead compares the log-fold changes (similar to question 4 in the original post). If we were to compare directly between diseases, I would probably use voom with a ~type*condition model and block on individual with duplicateCorrelation.

ADD REPLY • link 9.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron!

There is no way that I could get that working in DESeq2 using e.g. a linear combination of coefficients (e.g. contrast=c(-0.5, 0.5, -0.5, 0.5, 1, -1))?

ADD REPLY • link 9.5 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Thanks! Absolutely makes sense. Nice explanation!

However (also regarding Aaron's comment below), typeDis is in fact Disease, conditionA in individual 1, right? So that way, i.e. with the present pairing setup, there is no way I can directly compare Disease vs Control. However, if I didn't have the pairing I could in fact extract the difference Dis vs Ctrl, both condition A from typeDis if I got it correctly.

ADD REPLY • link 9.5 years ago Johannes Rainer ★ 2.1k

1

Entering edit mode

Exactly. I didn't think things all the way through - as Aaron notes, there are some things you can't really get at with this experimental design in the context of a generalized linear model.

ADD REPLY • link 9.5 years ago James W. MacDonald 67k