I am using DESeq2 to analyse RNA-seq data and very appreciated its comprehensive functionality. In my data, the counts are affected by genotype and cell type. I think there is interaction between genotype and cell type. Therefore, interaction term should be considered. In the help pages of function "results", there are two options for doing this. One is adding the interaction term explicitly in the formula, the other is combing two factors into one factor named group. I run the examples provided in the help page and found that two options output different results. For example, log2 fold change are different. So, what's reason or rationale underlying this situation?
if you want differences between groups, a grouping vector is more straightforward, if you interested in interaction effects, a model with interaction terms is easier to interpret.
Does that means the absolute value of log fold change of gene computed by design with interaction term is usually larger than the absolute value of LFC computed by design with grouping variable?
No, this is not usually the case. The-fold-change shrinkage will primarily shrink fold changes for lowly expressed genes, as their fold changes are more variable than the fold changes of highly expressed genes. Following the NB model, the log fold changes have a variance that is very roughly proportional to
If you run DESeq2 with an interaction term in the design formula, the beta-shrinkage is turned off, i.e. the fold changes are not shrunken. This is also mentioned in the examples section of the results:
# design with interactions terms by default have betaPrior=FALSE
However, if you take your two factors and combine them into a group vector, the beta-shrinkage is performed, leading to different fold change estimates. See also section 3.3. of the vignette for a thorough discussion of this.
I got it. Thanks Bernd so much!
Added question: which option do you recommend? interaction term in the design formula or take your two factors and combine them into a group vector ?
This depends on what you want:
if you want differences between groups, a grouping vector is more straightforward, if you interested in interaction effects, a model with interaction terms is easier to interpret.
Does that means the absolute value of log fold change of gene computed by design with interaction term is usually larger than the absolute value of LFC computed by design with grouping variable?
No, this is not usually the case. The-fold-change shrinkage will primarily shrink fold changes for lowly expressed genes, as their fold changes are more variable than the fold changes of highly expressed genes. Following the NB model, the log fold changes have a variance that is very roughly proportional to
var(logFC) ~ 1/(raw gene expression counts) + dispersion
So a some proportion but not all (!) lowly expressed genes will have large fold changes by chance. Those are then shrunken.
So in a nutshell, the difference will mainly be visible in a certain proportion of the lowly expressed genes.