Understanding interaction term in DESeq2
1
1
Entering edit mode
@jasontaotaotan-23946
Last seen 6 weeks ago
United States

Hi everyone,

For DESeq2, I am trying to understand the interpretation of with/without an interaction term in the design formula. Assume we have two conditions (Treatment vs Control) and two genotypes (Mutant vs WT). Here is a simulation study and my interpretation.

> library("DESeq2")
# make example dataset and fix the reference level
> dds <- makeExampleDESeqDataSet(n=10000,m=12)
> dds$condition <- as.factor(c(rep("Control",6), rep("Treatment",6))) > dds$genotype <- as.factor(rep(c(rep("WT", 3), rep("Mutant",3)),2))
> dds$genotype <- relevel(dds$genotype, "WT")

> as.data.frame(colData(dds))


## Case1: With interaction

I could include the interaction term in the design, as I showed in the following code. Then I would retrieve the result of conditionTreatmentvs_Control. From what I understand, this is comparing Treatment and Control under WT, or comparing sample 7-9 with sample 1-3.

> design(dds) <- ~ condition + genotype + condition:genotype
> dds <- DESeq(dds)
> results(dds, name = "condition_Treatment_vs_Control")[1:5,]


## Case2: Without Interaction

I could write the design formula without the interaction term and still, I want to retrieve the result of conditionTreatmentvs_Control. This is technically the same as model the batch effect. DESeq2 will regress out the effect of genotype. In other words, DESeq2 will take all samples into consideration.

> design(dds) <- ~ condition + genotype
> dds <- DESeq(dds)
> results(dds, name = "condition_Treatment_vs_Control")[1:5,]


But if you look at the results, they are very different. I want to validate my point by manually calculate the log2FoldChange column.

Here I try to calculate the log2foldchange for gene 1, with the interaction term included in the design formula

> log2(mean(counts(dds, normalized = T)[1,7:9]) / mean(counts(dds, normalized = T)[1,1:3]))


This is almost the same as the result of DESeq2 (with interaction).

So here is my question: 1. Is my interpretation of the design formula correct? 2. How to manually calculate the log2foldchange for case2 (without interaction)?

Thanks all.

deseq2 design formula log fold change • 293 views
2
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

1) Yes, and we do have a diagram in the vignette section on interactions which you might look at.

2) It is not trivial to calculate the LFC for case 2 for a GLM. So there isn't really a surprise for me that cases 1 and 2 give different answers.

0
Entering edit mode

Thanks Michael. This helped with my understanding to DESeq2. I have two additional questions regarding shrinkage estimators. It will be great if you can give me some suggestions.

1. "apeglm" and "ashr" are both mentioned in the vignette. Do you have suggestions on which method to choose in which specific case? Or both are general methods that could apply to every situation.

2. You mentioned that shrinkage estimators are not recommended for design with the interaction term. If I have condition (control vs treat) and genotype (mutant and WT) in my design matrix and I specify design formula as ~ condition * genotype. Do you mean estimator shouldn’t be use in all resultsNames(dds) include "Intercept", "condition_Treat_vs_Control", "genotype_Mutant_vs_WT", "conditionTreat.genotypeMutant"? Or do you mean estimator shouldn't be use in "conditionTreat.genotypeMutant"?

1
Entering edit mode

(1) Both are general methods, but there are some cases where only ashr can be used. In the extended section there is a table summarizing this.

(2) Shrinkage estimators can now be used with interaction terms with apeglm and ashr (that's actually a row in the table I'm referring to). We had found that we could not reliably use them with the 2014 estimator, but we resolved that issue with the new methods.