Hi,
I have questions regarding pairwise comparisons for "both between and within subject" RNA-seq experiments analyzed using DESeq2.
My experiment comprises of subjects within block randomly assigned to one of three environmental treatments: 1) CT - control environment condition (n=6), 2) EA - environmental condition A (n=6), and 3) EB - environmental condition B (n=5), in which samples were collected from all subjects at day 14 and 22 of the experiment, so a total of 34 unique samples derived from 17 experimental units.
The experiment was analyzed similarly to what is proposed in the DESeq2 vignette for “Group-specific condition effects, individuals nested within groups” coupled with “Levels without samples”. Below is the source code:
## Frame matrix as needed block <- factor(rep(c(5,1,2,6,4,3,1,4,5,3,6,2,6,2,4,1,5),each=2)) trt <- factor(c(rep("CT",12),rep("EA",12),rep("EB",10))) time <- factor(rep(c(14,22))) coldata <- data.frame(block, trt, time) coldata
block trt time 1 5 CT 14 2 5 CT 22 3 1 CT 14 4 1 CT 22 ... 13 1 EA 14 14 1 EA 22 ... 17 5 EA 14 18 5 EA 22 ... 31 1 EB 14 32 1 EB 22 33 5 EB 14 34 5 EB 22## Model matrix for full model (trt HS is missing a experimental unit, so delete a column that sum=0) m1 <- model.matrix(~trt+block:trt+trt:time, coldata) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero
(Intercept) trtEA trtEB trtCT:block2 trtEA:block2 trtEB:block2 trtCT:block3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE trtEA:block3 trtEB:block3 trtCT:block4 trtEA:block4 trtEB:block4 trtCT:block5 trtEA:block5 FALSE TRUE FALSE FALSE FALSE FALSE FALSE trtEB:block5 trtCT:block6 trtEA:block6 trtEB:block6 trtCT:time22 trtEA:time22 trtEB:time22 FALSE FALSE FALSE FALSE FALSE FALSE FALSEm1 <- m1[,-which(all.zero)]
## Model matrix for reduced model (trt HS is missing a experimental unit, so delete a column that sum=0) m2 <- model.matrix(~trt+block:trt, coldata) all.zero <- apply(m2, 2, function(x) all(x==0)) all.zero
(Intercept) trtEA trtEB trtCT:block2 trtEA:block2 trtEB:block2 trtCT:block3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE trtEA:block3 trtEB:block3 trtCT:block4 trtEA:block4 trtEB:block4 trtCT:block5 trtEA:block5 FALSE TRUE FALSE FALSE FALSE FALSE FALSE trtEB:block5 trtCT:block6 trtEA:block6 trtEB:block6 FALSE FALSE FALSE FALSEm2 <- m2[,-which(all.zero)]
## Model design dds <- DESeqDataSetFromMatrix(countData=rcount, colData=coldata, design=m1) ## design: contains the full statistical model
## Filtering for genes that non-normalized sum of expression is >=1 in at least 5 samples keep <- rowSums(counts(dds)>=1)>=5 table(keep) dds <- dds[keep,]
## Differential expression analysis dds <- DESeq(dds, test="LRT", reduced=m2) ## reduced: contains the reduced statistical model, which doesn't contain the variable of interest
## Name of columns available as results resultsNames(dds)
[1] "Intercept" "trtEA" "trtEB" "trtCT.block2" "trtEA.block2" "trtEB.block2" [7] "trtCT.block3" "trtEA.block3" "trtCT.block4" "trtEA.block4" "trtEB.block4" "trtCT.block5" [13] "trtEA.block5" "trtEB.block5" "trtCT.block6" "trtEA.block6" "trtEB.block6" "trtCT.time22" [19] "trtEA.time22" "trtEB.time22"## Build matrix of results res_EAvsCT_d14 <- results(dds, test="Wald", name=("trtEA")) res_EBvsCT_d14 <- results(dds, test="Wald", name=("trtEB")) res_EBvsEA_d14 <- results(dds, test="Wald", contrast=list("trtEB","trtEA")) res_EAvsCT_d22 <- results(dds, test="Wald", contrast=list("trtEA.time22","trtCT.time22")) res_EBvsCT_d22 <- results(dds, test="Wald", contrast=list("trtEB.time22","trtCT.time22")) res_EAvsEB_d22 <- results(dds, test="Wald", contrast=list("trtEB.time22","trtEA.time22")) res_d22vs14_CT <- results(dds, test="Wald", name=("trtCT.time22")) res_d22vs14_EA <- results(dds, test="Wald", contrast=list("trtEA.time22","trtEA")) res_d22vs14_EB <- results(dds, test="Wald", contrast=list("trtEB.time22","trtEB"))
I am interested in the following pairwise comparisons:
- EB vs EA on day 14
- EA vs CT on day 22
- EB vs CT on day 22
- EB vs EA on day 22
- Day 22 vs 14 for treatment CT
- Day 22 vs 14 for treatment EA
- Day 22 vs 14 for treatment EB
I read the vignettes() and ?results for DESeq2, and also looked online for questions posted in forums that could help me understand if the contrast terms I am using are adequately calling the comparisons I am interested in, but, so far, I haven’t quite understood it. So, my questions are:
- Are my result contrast terms adequately calling the comparisons I am interested in?
- Do all contrasts of interest account for “CT on day 14” baseline? For example, in “EB vs EA on day 14”, is the actual comparison “EB – EA” or “(EB – CT on day14) – (EA – CT on day 14)”?
I appreciate any help very much.
Thanks,
Ricardo Rodrigues
Hi Michael,
Thank you very much for your help.
I am interested in treatment comparison within each timepoint, as mentioned above, so I'm thinking in analyzing timepoints separately. Thus, I plan to use the following models:
full = ~block + trt
andreduced = ~block
. Would it be a reasonable approach? Any suggestions?Using the proposed models, I get:
For the comparison "EB vs EA", I am using:
results(dds, test="Wald", contrast=c("trt", "EB","EA"))
, although I am wondering if this approach generates the similar results to those ofrelevel(trt, "EA")
. If I am to use the relevel option, do I need to re-run the whole model after releveling or I can I relevel only at the time of extracting the comparison of interest?Thank you,
Ricardo
You should drop "block" from the full and reduced models here. You don't have repeated measurements for the samples when you subset to a single timepoint. It should just be ~trt. And yes you can use contrast to compare EB and EA, you do not need to relevel and re-run DESeq().