I have questions regarding the use of interaction term versus grouping with different betaPrior arguments using version 1_14_1.
I'm still using DESEq2_1.14.1. I've been working on differential expression analysis of drought-tolerance in rice. I have 2 genotypes (tolerant and susceptible) and 2 conditions (drought and well-watered) with 4 replications each, essentially a 2x2 factorial experiment with 4 replications.
Now we want to identify genes that are uniquely upregulated and downregulated in both genotypes under drought. Specifically, we want to identify unique genes in the QTL region of the very drought tolerant genotype that are upregulated and downregulated. We then want to use these genes for functional validation through CRISPR-Cpf1. One of the major hypothesis that we want to test is that there are differentially expressed genes (DEGs) between the two genotypes in the QTL region under drought and we want to identify them. Specifically, we want to identify DEGs under drought in the tolerant genotype. Since we don't know the mechanisms of drought tolerance at the reproductive-stage, we set-up contrast to identify DEGs between several groups.
I set-up the codes as follows:
colData <- data.frame(genotype=rep(c("IL","Swarna"),each=8), condition=rep(rep(c("Control","Drought"),each=4),times=2)) rownames(colData) <- colnames(tx.all$counts) dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~genotype+condition+genotype:condition)) colData(dds)$condition<-relevel(colData(dds)$condition, ref = "Control") dds$group<-factor(paste0(dds$genotype, dds$condition)) design(dds) <- ~group
Question 1: According to the vignette, "Using the design is similar to adding an interaction term", is there any conceptual difference/s using the grouping with using the interaction design in a multi-factor design?
Without grouping and using the interaction term, I'm getting this under resultsnames with betaprior=FALSE:
 "Intercept" "genotype_Swarna_vs_IL" "condition_Drought_vs_Control"
With grouping using betaPrior=TRUE, I got these four different groups.
dds<-DESeq(dds, betaPrior = TRUE, parallel = TRUE) resultsNames(dds) # "Intercept" "groupILControl" "groupILDrought" "groupSwarnaControl"  "groupSwarnaDrought
With grouping using betaPrior=FALSE, I got these groups:
 "Intercept" "group_ILDrought_vs_ILControl" "group_SwarnaControl_vs_ILControl"
I ended up using the grouping with betaPrior=TRUE and do pairwise comparisons using results() and contrast with the group variable.
If I want to account for the differences between the different conditions and genotypes in understanding the transcriptional regulation of rice under reproductive-stage drought stress, does grouping ("combine the factors of interest into a single factor with all combinations of the original factors") is a logical approach to take in modeling multiple condition and genotype effects?
The way I extracted this information is given below"
res.05_NILD_NILC <- results(dds, contrast=c("group","ILDrought", "ILControl"), alpha=.05, parallel = TRUE)
res.05_SWAD_SWAC <- results(dds, contrast=c("group","SwarnaDrought", "SwarnaControl"), alpha=.05, parallel = TRUE)
res.05_NILC_SWAC <- results(dds, contrast=c("group","ILControl", "SwarnaControl"), alpha=.05, parallel = TRUE)
res.05_NILD_SWAD <- results(dds, contrast=c("group","ILDrought", "SwarnaDrought"), alpha=.05, parallel = TRUE)
You mentioned on one of the threads that " "lfcShrink() gives the identical moderated LFCs as DESeq() gave in previous versions." and "If you want to obtain (nearly) the same results in version 1.16 as in 1.14 you can do: dds <- DESeq(dds, betaPrior=TRUE)." So when running version 1.14.1 using dds <- DESeq(dds, betaPrior=TRUE) with grouping would be (nearly) the same when using the lfcShrink of version 1.16.1?