I have the following experimental design—
Experimental Variable A (EVA, six levels) Experimental Variable B (EVB, two levels) Time Variable (four levels) Batch Variable (Year, two levels)
The design is balanced and I have four biological replicates for each possible combination of the above four variable levels (384 samples total)
I’m interested in the effects of Experimental Variables and Time, and want to account for the batch effect.
I currently have the following Full model:
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata,
design = ~ year + time + EVA + EVB +
year:time + year: EVA + year: EVB +
time: EVA + time: EVB + EVA: EVB +
year:time: EVA + year:time: EVB + year: EVA: EVB +
time: EVA: EVB + year:time:EVA:EVB, parallel=TRUE )
I will probably have a significant interaction between Batch and the Experimental Variables A & B and am using the LRT to test whether "year:time:EVA:EVB" (and other interaction terms that include the Batch variable) should be included in the model:
dds.lrt.1 <- DESeq(ddsClean, reduced= ~ year + time + EVA + EVB +
year:time + year: EVA + year: EVB +
time: EVA + time: EVB + EVA: EVB +
year:time: EVA + year:time: EVB + year: EVA: EVB +
time: EVA: EVB, test="LRT", parallel=TRUE )
res <- results(dds.lrt.1)
> sum(res$padj < 0.1, na.rm=TRUE)
[1] 3591
Someone told me that if Batch interacts with the Experimental Variables, I cannot compare levels of each experimental variable without first splitting my data into the two separate batches, doing comparisons within the years, and then intersecting the lists of differentially expressed from each year. This doesn’t seem quite right, but I'm not sure what to do.
My Baseline is: EVA L1 EVB L1 Time 1 Year 1
I am interested in comparing specific (1) levels of EVA given the same EVB and time (2) levels of EVB given the same EVA and time, and (3) whether the change between pairs of consecutive times is equal between different groups.
As two examples (of many), if I do not include a batch effect, my results() look like this for each of the following comparisons:
# Effect of EVA L2 (at Time 4 and EVB L1)
First <- results(ddsClean, contrast = list(c("EVA_L2_vs_L1", "time4.EVAL2")), pAdjustMethod="fdr")
# EVB L1, EVA L1, Time 2 --> Time3 vs. EVB L2, EVA L1, Time 2 --> Time3
Second <- results(ddsClean, contrast = list(
c("time2.EVBL2"),
c("time3.EVBL2")),
pAdjustMethod="fdr")
Provided this is correct, my understanding is that although year-related coefficients in resultsNames(ddsClean) aren't included in my results() extracts, I'm comparing samples only in Year 1 because that's the baseline for that variable.
I'm not interested in comparing years; how would I do the same group-specific sorts of comparisons but across years / by accounting for the batch effect?