Extract specific contrast from paired multifactor design matrix
1
0
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 7 weeks ago
Italy

Dear All!

I have the following setup for an RNA-seq experiment: 4 individuals, 2 of them have a disease, 2 are healthy controls, cells from each individual were cultured using two different conditions. So, the setup is very similar to the one described in the edgeR user guide (Section 3.5) or in the DESeq2 vignette (Section 3.12.1). Following the edgeR user guide I generated the design matrix (CTRL are the healthy controls, DIS the patients, condition A and B code for the two different cell culture conditions):

colD <- data.frame(type=rep(c("DIS", "CTRL"), each=4),
                   condition=rep(c("A", "B"), 4),
                   individual=rep(1:4, each=2),
                   stringsAsFactors=TRUE)
## add the nested individual following the edgeR user guide
colD <- cbind(colD, nind=factor(rep(rep(1:2, each=2), 2)))
colD

  type condition individual nind
1  DIS         A          1    1
2  DIS         B          1    1
3  DIS         A          2    2
4  DIS         B          2    2
5 CTRL         A          3    1
6 CTRL         B          3    1
7 CTRL         A          4    2
8 CTRL         B          4    2
## the model matrix for the experiment
model.matrix(~type + type:nind + type:condition, colD)

  (Intercept) typeDIS typeCTRL:nind2 typeDIS:nind2 typeCTRL:conditionB
1           1       1              0             0                   0
2           1       1              0             0                   0
3           1       1              0             1                   0
4           1       1              0             1                   0
5           1       0              0             0                   0
6           1       0              0             0                   1
7           1       0              1             0                   0
8           1       0              1             0                   1
  typeDIS:conditionB
1                  0
2                  1
3                  0
4                  1
5                  0
6                  0
7                  0
8                  0
attr(,"assign")
[1] 0 1 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$type
[1] "contr.treatment"

attr(,"contrasts")$nind
[1] "contr.treatment"

attr(,"contrasts")$condition
[1] "contr.treatment"

 

The questions I want to answer in that RNA-Seq experiment are:

1) genes differentially expressed between DIS and CTRL, both cultured at condition A.

2) genes differentially expressed in DIS between B and A.

3) genes differentially expressed in CTRL between B and A.

4) genes differentially regulated between (DIS, B vs A) and (CTRL, B vs A).

Now, to answer question 2) I can extract coefficient "typeDIS:conditionB", to answer 3 "typeCTRL:conditionB" and the contrast between both I can extract with results(..., contrast=c(0, 0, 0, 0, -1, 1)). It is however not clear to me how to answer question 1), i.e. the difference between DIS cells and condition A and CTRL cells and condition A. I first thought that I can use "typeDIS" but to me that looks rather like a comparison between all DIS and all CTRL samples (so conditions A and B pooled).

How can I answer question 1) using the design above?

 

thanks in advance for any comments and answers!

jo

edger deseq2 rna-seq • 1.6k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

No, typeDIS is estimating the difference between Disease and Control in condition A.

You can see this by noting that the columns of your design are the coefficients, and the rows represent your samples. The fifth row (which corresponds to Control, condition A) has only one 1, in the intercept column. Since this is a Control, condition A sample, this implies that the intercept is estimating the mean of the Control, condition A samples.

The first row of your design has just two 1s, one for the intercept (which we already decided was the mean of the Control, condition A samples), and one for typeDIS. Since this sample is a Disease, condition A sample, typeDIS HAS to be Disease, conditionA - Control, condition A. This is just algebra:

Dis/A = Ctrl/A + (something) => (something) = Dis/A - Ctrl/A

Does that make sense?

 

ADD COMMENT
1
Entering edit mode

There's also an additional subtlety of the individual effect, which confounds the interpretation of the typeDIS term. Consider an example where both samples of individual 1 are highly expressed, while both samples of individual 2 are lowly expressed. Assume that the average expression of individuals 1 and 2 is equal to the average expression of individuals 3 and 4, i.e., no disease effect for condition A (or B). For simplicity, assume that both samples for the same individual have the same expression. Upon fitting, one will find that typeDIS is non-zero and positive while typeDIS:nind2 is large and negative (all other non-intercept coefficients are zero). For example, if we assign the matrix in the original post to design, we can get:

> counts <- c(200, 200, 0, 0, 100, 100, 100, 100)
> glmFit(rbind(counts), design, offset=0, dispersion=0.05)$coef
       (Intercept)   typeDIS typeCTRL:nind2 typeDIS:nind2 typeCTRL:conditionB
counts    4.383276 0.6925228     1.0527e-16     -7.378383          1.0527e-16
       typeDIS:conditionB
counts       -3.41739e-16

Treating typeDIS as the disease effect for condition A would suggest a difference where there is none. Indeed, the example in Section 3.5 in the user's guide never compares samples directly between disease states, but instead compares the log-fold changes (similar to question 4 in the original post). If we were to compare directly between diseases, I would probably use voom with a ~type*condition model and block on individual with duplicateCorrelation.

ADD REPLY
0
Entering edit mode

Thanks Aaron!

There is no way that I could get that working in DESeq2 using e.g. a linear combination of coefficients (e.g. contrast=c(-0.5, 0.5, -0.5, 0.5, 1, -1))?
 

ADD REPLY
0
Entering edit mode

Thanks! Absolutely makes sense. Nice explanation!

However (also regarding Aaron's comment below), typeDis is in fact Disease, conditionA in individual 1, right? So that way, i.e. with the present pairing setup, there is no way I can directly compare Disease vs Control. However, if I didn't have the pairing I could in fact extract the difference Dis vs Ctrl, both condition A from typeDis if I got it correctly.

 

ADD REPLY
1
Entering edit mode

Exactly. I didn't think things all the way through - as Aaron notes, there are some things you can't really get at with this experimental design in the context of a generalized linear model.

ADD REPLY

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6