Hello all, I've posted with the same design previously, but my statistical analysis plan was unclear. Now that I have more clarity, I have since updated my code to the best of my ability using contrasts and interaction terms to address my 4 goals.
My experimental design is as follows: 96 samples total, each of which are classified under 4 factors (in order of hierarchy):
1. Tissue (brain, pancreas)
2. Genotype (KO, WT)
3. Treatment (Control, Low, High)
4. Sex (M, F)
For now, I am mainly concerned with genotype, tissue, and treatment. My goals are to compare the following:
1. WT control-treatment samples versus KO control-treatment samples within each tissue (i.e. Brain WT control vs Brain KO control; pancreas WT control vs pancreas KO control)
2. WT high-treatment vs WT control-treatment within each tissue (i.e. brain WT high vs brain WT control; pancreas WT high vs pancreas WT control)
3. WT low-treatment vs WT control-treatment within each tissue
4. Repeat 2 and 3 but for KO samples
Q1: I've run condition1:condition2 and condition2:condition1 and they both produced the same result - does order matter for interaction terms?
This begins my updated code:
#Establish design and controls for analysis
dds <- DESeqDataSetFromMatrix(countData = allCounts, colData = allMetadata, design = ~ Genotype + Sex + Treatment + Tissue + Tissue:Treatment + Tissue:Genotype + Genotype:Treatment)
dds$Treatment <- relevel(as.factor(dds$Treatment), "Control")
dds$Genotype <- relevel(as.factor(dds$Genotype), "WT")
dds$Tissue <- relevel(as.factor(dds$Tissue), "Brain")
dds <-DESeq(dds)
resultsNames(dds)
When running this code, I receive the following:
> resultsNames(dds)
[1] "Intercept" "Genotype_KO_vs_WT" "Sex_M_vs_F"
[4] "Treatment_High_vs_Control" "Treatment_Low_vs_Control" "Tissue_Pancreas_vs_Brain"
[7] "TreatmentHigh.TissuePancreas" "TreatmentLow.TissuePancreas" "GenotypeKO.TissuePancreas"
[10] "GenotypeKO.TreatmentHigh" "GenotypeKO.TreatmentLow"
After this, I could use some guidance in terms of syntax and logic. From what I could gather, since I established references by releveling each factor, that means if my factor of interest is already a baseline, it would not be specified in "name = ()" or "contrast = ()"? Based on this understanding, I produced the following code:
#BRAIN: WT High versus Control
#Brain Tissue and WT Genotype are established as baseline -> do not define in contrast
brain_wt_hi <- results(dds, contrast = c("Treatment", "High", "Control"))
brain_wt_hi_sig <- as.data.frame(subset(brain_wt_hi, padj < 0.05))
brain_wt_hi_sig
#Brain WT Low vs Control is similar to the above code, except with "low" instead of "high"
#BRAIN: KO control versus WT control
#Brain Tissue and Control Treatment are established as baseline -> do not define in contrast
brain_ko_wt_ctrl <- results(dds, contrast = c("Genotype", "KO", "WT"))
brain_ko_wt_ctrl_sig <- as.data.frame(subset(brain_ko_wt_ctrl, padj < 0.05))
brain_ko_wt_ctrl_sig
#BRAIN: KO High versus Control
brain_ko_hi <- results(dds, name = ("GenotypeKO.TreatmentHigh"))
brain_ko_hi_sig <- as.data.frame(subset(brain_ko_hi, padj < 0.05))
brain_ko_hi_sig
#Brain KO Low vs Control is similar to the above code, except with "low" instead of "high"
#PANCREAS: WT High versus Control
panc_wt_hi <- results(dds, name ="TreatmentHigh.TissuePancreas")
panc_wt_hi_sig <- as.data.frame(subset(panc_wt_hi, padj < 0.05))
panc_wt_hi_sig
#Pancreas WT Low vs Control is similar to the above code, except with "low" instead of "high"
#PANCREAS: KO control versus WT control
panc_ko_wt_ctrl <- results(dds, name = "GenotypeKO.TissuePancreas")
panc_ko_wt_ctrl_sig <- as.data.frame(subset(panc_ko_wt_ctrl, padj < 0.05))
panc_ko_wt_ctrl_sig
Q2. Is my understanding of interaction terms completely misguided or does it seem like I am headed in the correct direction?
Q3. Alternatively, I've seen designs with fewer factors using the "grouping" approach - I tried my hand at it, but am I correct in thinking that this would not answer my goals?
#The following is my attempt at using a grouping approach, but I believe this is comparing all treatments, not just controls
dds_test <- DESeqDataSetFromMatrix(countData = allCounts, colData = allMetadata, design = ~ Sex + Treatment + Tissue_Geno)
dds_test$Tissue_Geno <- factor(paste0(dds_test$Tissue, dds_test$Genotype))
design(dds_test) <- ~ Tissue_Geno
dds_test <- DESeq(dds_test)
resultsNames(dds_test)
brain_geno_test <- results(dds_test, name=c("Tissue_Geno_BrainWT_vs_BrainKO"))
brain_geno_test_sig <- as.data.frame(subset(brain_geno_test, padj < 0.05))
Along those lines - I don't specifically want to specify brain as the "reference" tissue since the pancreas and brain are being analyzed separately, but both should be included in the same dds. This code made the most sense to me based on my current understanding. So, if anyone is able to share how I can subset 1 of 4 factors (i.e. tissue out of genotype, sex, and treatment), I would really appreciate it!
Previous post
DESeq2 design with multiple factors