Hello,
I am trying to determine the best DESeq2 design formula for my study, but I'm a bit unsure if the design I am using is appropriate for what I want to test. My study consists of 18 samples; 3 tissues each taken from 3 treatment animals and 3 tissues each taken from 3 control animals:
animal | sex | treatment_day | developmental_stage | group |
1 | M | 1 | A | Tissue1_Control |
1 | M | 1 | A | Tissue2_Control |
1 | M | 1 | A | Tissue3_Control |
2 | F | 2 | C | Tissue1_Control |
2 | F | 2 | C | Tissue2_Control |
2 | F | 2 | C | Tissue3_Control |
3 | F | 1 | B | Tissue1_Control |
3 | F | 1 | B | Tissue2_Control |
3 | F | 1 | B | Tissue3_Control |
4 | M | 2 | B | Tissue1_Treatment |
4 | M | 2 | B | Tissue2_Treatment |
4 | M | 2 | B | Tissue3_Treatment |
5 | F | 2 | B | Tissue1_Treatment |
5 | F | 2 | B | Tissue2_Treatment |
5 | F | 2 | B | Tissue3_Treatment |
6 | F | 1 | A | Tissue1_Treatment |
6 | F | 1 | A | Tissue2_Treatment |
6 | F | 1 | A | Tissue3_Treatment |
I want to determine the effect of the treatment on the three different tissues while controlling for treatment_day, sex, and developmental_stage. I decided to use a grouping variable for each sample (i.e., Tissue1_Treatment, Tissue1_Control, Tissue2_Treatment, Tissue2_Control, etc.) for straightforward contrasting. To control for the potential confounding variables, I implemented a design of ~treatment_day+sex+developmental_stage+group.
If I want to determine the DEGs from Tissue1_Treatment compared to Tissue1_Control and use contrast=list("Tissue1_Treatment","Tissue1_Control")), am I correct in interpreting the output as "DEGs in response to Treatment while controlling for treatment_day, sex, and developmental_stage"? I've seen plenty of posts in which people use the ~group design, but I haven't come across any that use other variables along the the grouping variable.
Thanks in advance for any help!
Thank you, Michael! One last clarification: because only one animal was at developmental_stage C, C is only present in treatment_day 2, the female category, and the control tissues (i.e., Tissue1_Control, Tissue2_Control, Tissue3_Control). However, developmental_stages A and B are spread across both treatment_days, sexes, and control/treatment tissues. Is it still OK to include developmental_stage in the design or will this lead to issues in the DE analysis?
Thanks again!
As long as they are not confounded, they can be estimated as additive effects and controlled for when testing on group. It looks like they are not confounded from the above (and DESeq2 will give an error if they are).
Great, thank you for your help!