Question: Pairwise comparisons for "both between and within subject"experiment using DESeq2
0
6 months ago by
USA
Ricardo Rodrigues0 wrote:

Hi,

I have questions regarding pairwise comparisons for "both between and within subject" RNA-seq experiments analyzed using DESeq2.

My experiment comprises of subjects within block randomly assigned to one of three environmental treatments: 1) CT - control environment condition (n=6), 2) EA - environmental condition A (n=6), and 3) EB - environmental condition B (n=5), in which samples were collected from all subjects at day 14 and 22 of the experiment, so a total of 34 unique samples derived from 17 experimental units.

The experiment was analyzed similarly to what is proposed in the DESeq2 vignette for “Group-specific condition effects, individuals nested within groups” coupled with “Levels without samples”. Below is the source code:

## Frame matrix as needed
block <- factor(rep(c(5,1,2,6,4,3,1,4,5,3,6,2,6,2,4,1,5),each=2))
trt <- factor(c(rep("CT",12),rep("EA",12),rep("EB",10)))
time <- factor(rep(c(14,22)))
coldata <- data.frame(block, trt, time)
coldata
  block trt time
1      5  CT   14
2      5  CT   22
3      1  CT   14
4      1  CT   22
...
13     1  EA   14
14     1  EA   22
...
17     5  EA   14
18     5  EA   22
...
31     1  EB   14
32     1  EB   22
33     5  EB   14
34     5  EB   22

## Model matrix for full model (trt HS is missing a experimental unit, so delete a column that sum=0)
m1 <- model.matrix(~trt+block:trt+trt:time, coldata)
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
(Intercept)        trtEA        trtEB trtCT:block2 trtEA:block2 trtEB:block2 trtCT:block3
FALSE        FALSE        FALSE        FALSE        FALSE        FALSE        FALSE
trtEA:block3 trtEB:block3 trtCT:block4 trtEA:block4 trtEB:block4 trtCT:block5 trtEA:block5
FALSE         TRUE        FALSE        FALSE        FALSE        FALSE        FALSE
trtEB:block5 trtCT:block6 trtEA:block6 trtEB:block6 trtCT:time22 trtEA:time22 trtEB:time22
FALSE        FALSE        FALSE        FALSE        FALSE        FALSE        FALSE
m1 <- m1[,-which(all.zero)]

## Model matrix for reduced model (trt HS is missing a experimental unit, so delete a column that sum=0)
m2 <- model.matrix(~trt+block:trt, coldata)
all.zero <- apply(m2, 2, function(x) all(x==0))
all.zero
 (Intercept)        trtEA        trtEB trtCT:block2 trtEA:block2 trtEB:block2 trtCT:block3
FALSE        FALSE        FALSE        FALSE        FALSE        FALSE        FALSE
trtEA:block3 trtEB:block3 trtCT:block4 trtEA:block4 trtEB:block4 trtCT:block5 trtEA:block5
FALSE         TRUE        FALSE        FALSE        FALSE        FALSE        FALSE
trtEB:block5 trtCT:block6 trtEA:block6 trtEB:block6
FALSE        FALSE        FALSE        FALSE 
m2 <- m2[,-which(all.zero)]

## Model design
dds <- DESeqDataSetFromMatrix(countData=rcount, colData=coldata, design=m1)
## design: contains the full statistical model

## Filtering for genes that non-normalized sum of expression is >=1 in at least 5 samples
keep <- rowSums(counts(dds)>=1)>=5
table(keep)
dds <- dds[keep,]

## Differential expression analysis
dds <- DESeq(dds, test="LRT", reduced=m2)
## reduced: contains the reduced statistical model, which doesn't contain the variable of interest

## Name of columns available as results
resultsNames(dds)
 [1] "Intercept"    "trtEA"        "trtEB"        "trtCT.block2" "trtEA.block2" "trtEB.block2"
[7] "trtCT.block3" "trtEA.block3" "trtCT.block4" "trtEA.block4" "trtEB.block4" "trtCT.block5"
[13] "trtEA.block5" "trtEB.block5" "trtCT.block6" "trtEA.block6" "trtEB.block6" "trtCT.time22"
[19] "trtEA.time22" "trtEB.time22"

## Build matrix of results
res_EAvsCT_d14 <- results(dds, test="Wald", name=("trtEA"))
res_EBvsCT_d14 <- results(dds, test="Wald", name=("trtEB"))
res_EBvsEA_d14 <- results(dds, test="Wald", contrast=list("trtEB","trtEA"))
res_EAvsCT_d22 <- results(dds, test="Wald", contrast=list("trtEA.time22","trtCT.time22"))
res_EBvsCT_d22 <- results(dds, test="Wald", contrast=list("trtEB.time22","trtCT.time22"))
res_EAvsEB_d22 <- results(dds, test="Wald", contrast=list("trtEB.time22","trtEA.time22"))
res_d22vs14_CT <- results(dds, test="Wald", name=("trtCT.time22"))
res_d22vs14_EA <- results(dds, test="Wald", contrast=list("trtEA.time22","trtEA"))
res_d22vs14_EB <- results(dds, test="Wald", contrast=list("trtEB.time22","trtEB"))

I am interested in the following pairwise comparisons:

1. EB vs EA on day 14
2. EA vs CT on day 22
3. EB vs CT on day 22
4. EB vs EA on day 22
5. Day 22 vs 14 for treatment CT
6. Day 22 vs 14 for treatment EA
7. Day 22 vs 14 for treatment EB

I read the vignettes() and ?results for DESeq2, and also looked online for questions posted in forums that could help me understand if the contrast terms I am using are adequately calling the comparisons I am interested in, but, so far, I haven’t quite understood it. So, my questions are:

1. Are my result contrast terms adequately calling the comparisons I am interested in?
2. Do all contrasts of interest account for “CT on day 14” baseline? For example, in “EB vs EA on day 14”, is the actual comparison “EB – EA” or “(EB – CT on day14) – (EA – CT on day 14)”?

I appreciate any help very much.
Thanks,
Ricardo Rodrigues

modified 6 months ago by Michael Love24k • written 6 months ago by Ricardo Rodrigues0
Answer: Pairwise comparisons for "both between and within subject"experiment using DESeq
0
6 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Hi Ricardo,

You can make comparisons 5-7 using results() with the “name” argument and the interaction term, eg "trtCT.time22” for the 22 vs 14 effect for control. The comparison 1-4 are not possible with this design, because you can’t control for the block/subject with fixed effects and compare across groups with different blocks/subjects. You could take an alternative approach with limma voom and the duplicateCorrelation() function, but you can’t make these comparisons with fixed effects models.

Hi Michael,

Thank you very much for your help.

I am interested in treatment comparison within each timepoint, as mentioned above, so I'm thinking in analyzing timepoints separately. Thus, I plan to use the following models: full = ~block + trt and reduced = ~block. Would it be a reasonable approach? Any suggestions?

Using the proposed models, I get:

resultsNames(dds)
[1] "Intercept"    "blk_2_vs_1"   "blk_3_vs_1"   "blk_4_vs_1"   "blk_5_vs_1"   "blk_6_vs_1"   "trt_EA_vs_CT"
[8] "trt_EB_vs_CT"

For the comparison "EB vs EA", I am using: results(dds, test="Wald", contrast=c("trt", "EB","EA")), although I am wondering if this approach generates the similar results to those of relevel(trt, "EA"). If I am to use the relevel option, do I need to re-run the whole model after releveling or I can I relevel only at the time of extracting the comparison of interest?

Thank you,
Ricardo