Question

How to handle batch effect and interaction with experimental variables

0

Entering edit mode

A • 0

@a-7557

Last seen 4.7 years ago

United States

I have the following experimental design—

Experimental Variable A (EVA, six levels) Experimental Variable B (EVB, two levels) Time Variable (four levels) Batch Variable (Year, two levels)

The design is balanced and I have four biological replicates for each possible combination of the above four variable levels (384 samples total)

I’m interested in the effects of Experimental Variables and Time, and want to account for the batch effect.

I currently have the following Full model:

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, 
                              design = ~ year + time + EVA + EVB + 
                                      year:time + year: EVA + year: EVB +
                                      time: EVA + time: EVB + EVA: EVB +
                                      year:time: EVA + year:time: EVB + year: EVA: EVB +
                                      time: EVA: EVB + year:time:EVA:EVB, parallel=TRUE )

I will probably have a significant interaction between Batch and the Experimental Variables A & B and am using the LRT to test whether "year:time:EVA:EVB" (and other interaction terms that include the Batch variable) should be included in the model:

 dds.lrt.1 <- DESeq(ddsClean, reduced=  ~ year + time + EVA + EVB + 
                                      year:time + year: EVA + year: EVB +
                                      time: EVA + time: EVB + EVA: EVB +
                                      year:time: EVA + year:time: EVB + year: EVA: EVB +
                                      time: EVA: EVB, test="LRT", parallel=TRUE )
res <- results(dds.lrt.1)
> sum(res$padj < 0.1, na.rm=TRUE)
[1] 3591

Someone told me that if Batch interacts with the Experimental Variables, I cannot compare levels of each experimental variable without first splitting my data into the two separate batches, doing comparisons within the years, and then intersecting the lists of differentially expressed from each year. This doesn’t seem quite right, but I'm not sure what to do.

My Baseline is: EVA L1 EVB L1 Time 1 Year 1

I am interested in comparing specific (1) levels of EVA given the same EVB and time (2) levels of EVB given the same EVA and time, and (3) whether the change between pairs of consecutive times is equal between different groups.

As two examples (of many), if I do not include a batch effect, my results() look like this for each of the following comparisons:

# Effect of EVA L2 (at Time 4 and EVB L1)
First <- results(ddsClean, contrast = list(c("EVA_L2_vs_L1", "time4.EVAL2")), pAdjustMethod="fdr")

# EVB L1, EVA L1, Time 2 --> Time3 vs. EVB L2, EVA L1, Time 2 --> Time3 
Second <- results(ddsClean, contrast = list(
    c("time2.EVBL2"),
    c("time3.EVBL2")),
    pAdjustMethod="fdr")

Provided this is correct, my understanding is that although year-related coefficients in resultsNames(ddsClean) aren't included in my results() extracts, I'm comparing samples only in Year 1 because that's the baseline for that variable.

I'm not interested in comparing years; how would I do the same group-specific sorts of comparisons but across years / by accounting for the batch effect?

deseq2 statistics differential expression • 1.3k views

ADD COMMENT • link 6.1 years ago A • 0

score 0 · Answer 1 · 2019-11-12

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

This appears to be a question about the linear modeling, and you have a very complex dataset with many analysis choices. I would highly recommend working with a statistician on this design and analysis plan. I don't have time to provide non-software related questions on the site at this time.

ADD COMMENT • link 6.1 years ago Michael Love 43k