Dear All!
I have the following setup for an RNA-seq experiment: 4 individuals, 2 of them have a disease, 2 are healthy controls, cells from each individual were cultured using two different conditions. So, the setup is very similar to the one described in the edgeR user guide (Section 3.5) or in the DESeq2 vignette (Section 3.12.1). Following the edgeR user guide I generated the design matrix (CTRL are the healthy controls, DIS the patients, condition A and B code for the two different cell culture conditions):
colD <- data.frame(type=rep(c("DIS", "CTRL"), each=4),
condition=rep(c("A", "B"), 4),
individual=rep(1:4, each=2),
stringsAsFactors=TRUE)
## add the nested individual following the edgeR user guide
colD <- cbind(colD, nind=factor(rep(rep(1:2, each=2), 2)))
colD
type condition individual nind
1 DIS A 1 1
2 DIS B 1 1
3 DIS A 2 2
4 DIS B 2 2
5 CTRL A 3 1
6 CTRL B 3 1
7 CTRL A 4 2
8 CTRL B 4 2
## the model matrix for the experiment
model.matrix(~type + type:nind + type:condition, colD)
(Intercept) typeDIS typeCTRL:nind2 typeDIS:nind2 typeCTRL:conditionB
1 1 1 0 0 0
2 1 1 0 0 0
3 1 1 0 1 0
4 1 1 0 1 0
5 1 0 0 0 0
6 1 0 0 0 1
7 1 0 1 0 0
8 1 0 1 0 1
typeDIS:conditionB
1 0
2 1
3 0
4 1
5 0
6 0
7 0
8 0
attr(,"assign")
[1] 0 1 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$type
[1] "contr.treatment"
attr(,"contrasts")$nind
[1] "contr.treatment"
attr(,"contrasts")$condition
[1] "contr.treatment"
The questions I want to answer in that RNA-Seq experiment are:
1) genes differentially expressed between DIS and CTRL, both cultured at condition A.
2) genes differentially expressed in DIS between B and A.
3) genes differentially expressed in CTRL between B and A.
4) genes differentially regulated between (DIS, B vs A) and (CTRL, B vs A).
Now, to answer question 2) I can extract coefficient "typeDIS:conditionB", to answer 3 "typeCTRL:conditionB" and the contrast between both I can extract with results(..., contrast=c(0, 0, 0, 0, -1, 1)). It is however not clear to me how to answer question 1), i.e. the difference between DIS cells and condition A and CTRL cells and condition A. I first thought that I can use "typeDIS" but to me that looks rather like a comparison between all DIS and all CTRL samples (so conditions A and B pooled).
How can I answer question 1) using the design above?
thanks in advance for any comments and answers!
jo

There's also an additional subtlety of the individual effect, which confounds the interpretation of the
typeDISterm. Consider an example where both samples of individual 1 are highly expressed, while both samples of individual 2 are lowly expressed. Assume that the average expression of individuals 1 and 2 is equal to the average expression of individuals 3 and 4, i.e., no disease effect for condition A (or B). For simplicity, assume that both samples for the same individual have the same expression. Upon fitting, one will find thattypeDISis non-zero and positive whiletypeDIS:nind2is large and negative (all other non-intercept coefficients are zero). For example, if we assign the matrix in the original post todesign, we can get:Treating
typeDISas the disease effect for condition A would suggest a difference where there is none. Indeed, the example in Section 3.5 in the user's guide never compares samples directly between disease states, but instead compares the log-fold changes (similar to question 4 in the original post). If we were to compare directly between diseases, I would probably usevoomwith a~type*conditionmodel and block onindividualwithduplicateCorrelation.Thanks Aaron!
There is no way that I could get that working in
DESeq2using e.g. a linear combination of coefficients (e.g.contrast=c(-0.5, 0.5, -0.5, 0.5, 1, -1))?Thanks! Absolutely makes sense. Nice explanation!
However (also regarding Aaron's comment below),
typeDisis in fact Disease, conditionA in individual 1, right? So that way, i.e. with the present pairing setup, there is no way I can directly compare Disease vs Control. However, if I didn't have the pairing I could in fact extract the difference Dis vs Ctrl, both condition A fromtypeDisif I got it correctly.Exactly. I didn't think things all the way through - as Aaron notes, there are some things you can't really get at with this experimental design in the context of a generalized linear model.