How to handle batch effect and interaction with experimental variables
Entering edit mode
A • 0
Last seen 3.2 years ago
United States

I have the following experimental design—

Experimental Variable A (EVA, six levels) Experimental Variable B (EVB, two levels) Time Variable (four levels) Batch Variable (Year, two levels)

The design is balanced and I have four biological replicates for each possible combination of the above four variable levels (384 samples total)

I’m interested in the effects of Experimental Variables and Time, and want to account for the batch effect.

I currently have the following Full model:

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, 
                              design = ~ year + time + EVA + EVB + 
                                      year:time + year: EVA + year: EVB +
                                      time: EVA + time: EVB + EVA: EVB +
                                      year:time: EVA + year:time: EVB + year: EVA: EVB +
                                      time: EVA: EVB + year:time:EVA:EVB, parallel=TRUE )

I will probably have a significant interaction between Batch and the Experimental Variables A & B and am using the LRT to test whether "year:time:EVA:EVB" (and other interaction terms that include the Batch variable) should be included in the model:

 dds.lrt.1 <- DESeq(ddsClean, reduced=  ~ year + time + EVA + EVB + 
                                      year:time + year: EVA + year: EVB +
                                      time: EVA + time: EVB + EVA: EVB +
                                      year:time: EVA + year:time: EVB + year: EVA: EVB +
                                      time: EVA: EVB, test="LRT", parallel=TRUE )
res <- results(dds.lrt.1)
> sum(res$padj < 0.1, na.rm=TRUE)
[1] 3591

Someone told me that if Batch interacts with the Experimental Variables, I cannot compare levels of each experimental variable without first splitting my data into the two separate batches, doing comparisons within the years, and then intersecting the lists of differentially expressed from each year. This doesn’t seem quite right, but I'm not sure what to do.

My Baseline is: EVA L1 EVB L1 Time 1 Year 1

I am interested in comparing specific (1) levels of EVA given the same EVB and time (2) levels of EVB given the same EVA and time, and (3) whether the change between pairs of consecutive times is equal between different groups.

As two examples (of many), if I do not include a batch effect, my results() look like this for each of the following comparisons:

# Effect of EVA L2 (at Time 4 and EVB L1)
First <- results(ddsClean, contrast = list(c("EVA_L2_vs_L1", "time4.EVAL2")), pAdjustMethod="fdr")

# EVB L1, EVA L1, Time 2 --> Time3 vs. EVB L2, EVA L1, Time 2 --> Time3 
Second <- results(ddsClean, contrast = list(

Provided this is correct, my understanding is that although year-related coefficients in resultsNames(ddsClean) aren't included in my results() extracts, I'm comparing samples only in Year 1 because that's the baseline for that variable.

I'm not interested in comparing years; how would I do the same group-specific sorts of comparisons but across years / by accounting for the batch effect?

deseq2 statistics differential expression • 858 views
Entering edit mode
Last seen 20 hours ago
United States

This appears to be a question about the linear modeling, and you have a very complex dataset with many analysis choices. I would highly recommend working with a statistician on this design and analysis plan. I don't have time to provide non-software related questions on the site at this time.


Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6