Hello everyone,
I am trying to pick the right design formula and would appreciate advice. Here is my experiment set up:
Week 0: controls (N=6)
Week 1: Treatment A (N=6), Treatment B (N=6), Treatment C (N=6)
Week 4: Treatment A (N=6), Treatment B (N=6), Treatment C (N=6)
In total, there are 42 independent subjects because sample retrieval required sacrificing the animals.
I will list what comparisons I would like and the design formula I have used thus far:
All variables shown below are binary dummy variables.
#What genes are differentially expressed between Week 1 and controls?
design = ~ Week_4 + Week_1
#What genes are differentially expressed between Week 4 and controls?
design = ~ Week_1 + Week_4
#What genes are differentially expressed between Week 1 Treatment A animals and controls (that aren't simply due to being in Week 1)? This could also be thought of as: How does Treatment A modify being in Week 1?
design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:TreatmentA
#What genes are differentially expressed between Week 1 Treatment B animals and controls (that aren't simply due to being in Week 1)? This could also be thought of as: How does Treatment B modify being in Week 1?
design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:Treatment_B
#What genes are differentially expressed between Week 1 Treatment C animals and controls (that aren't simply due to being in Week 1)? This could also be thought of as: How does Treatment C modify being in Week 1?
design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:Treatment_C
#What genes are differentially expressed between Week 1 Treatment A animals and Week 1 Treatment B animals?
No clue how to do this.
Any help would be much appreciated.
As a side note, I began with the following design formula:
design = ~ groups
Where "groups" took the form of:
Control, Control, Control, Control, Control, Control, 1A, 1A, 1A, 1A, 1A, 1A, 1B, 1B, 1B, 1B, 1B, 1B, 1C, 1C, 1C, 1C, 1C, 1C, 4A, 4A, 4A, 4A, 4A, 4A, 4B, 4B, 4B, 4B, 4B, 4B, 4C, 4C, 4C, 4C, 4C, 4C
But then I realized this would not provide me with results such as how Week 1 Treatment A vs. Control is distinct from Week 1 vs. Control.
Thank you in advance for any help! And I look forward to us solving this!
Hi swbarnes2, From a GLM perspective, I think it is possible to parse these effects. Take the following equation:
This formula controls for Week and Treatment and is looking at how TreatmentA modifies the Week1 effect.
Compare it to the following:
This formula^ is just looking at the effect of Week1 while controlling for the other Week timepoints (Week0 is not listed because that would lead to multicollinearity issues).
The difference between these two equations is that one looks just at Week1 while the other looks at how TreatmentA impacts the Week_1 effect.
Maybe I am missing something, but I think that makes sense.
It looks like you should have just two variables in your design, likely
Treatment
andTime
. This would make your problem a lot easier. Can you please go through the DESeq2 vignette and then come back if there are still questions?Your sample metadata, 'colData', could look something like:
Design formulae like this are confusing:
For example, what are the values of
Treatment_A
andTreatment_B
? It is implied that these are columns in your metadata, but what values do they contain?Thank you for your response, Kevin. Let me try to clarify.
To answer your question, TreatmentA and TreatmentB are dummy variables with 1's and 0's in my metadata. But let me back up and try to restate my question a bit more clearly. I truly appreciate you bearing with me in this process.
I originally used the following design formula (I also controlled for batch, sex, etc, but I am not including that here):
where groups is the following in my metadata:
Importantly, "groups" is a combined variable of both treatment and week. This is recommended by the DESeq2 vignette for assessing interactions, which is part of what I am trying to assess. I want to assess the following: 1) All 1 week subjects (A1, B1, C1) vs. controls 2) 1 week with the modifying effect of Treatment A vs. controls
I want to assess this interaction (#2 above) because I want to know the specific DEGs for Treatment A at 1 week vs. controls that are not simply due to the 1 week vs. controls comparison. That is, I want to know which genes uniquely contribute to the A1 vs. controls comparison that are not due to the 1 week vs. control comparison.
In order to get both of these assessments (#1 and #2), I'm fairly certain I have to make multiple design formulas <-- which feels very very unconventional and is the reason for my post.
I realize that most analyses use one design formula, but how else am I supposed to assess both an interaction term and a non-interaction term for DEGs?
So, you were actually manually creating a model matrix, but I think that you are implementing it incorrectly. The
model.matrix()
function just allows you to better understand which coefficients to use, but we construct this matrix like this:The formula used in DESeq2 is still then
~ Treatment + Time + Treatment:Time
Irrespective, your first part should be easy to do via the
group
variable.For your second part, I think that it is possible via the standard interaction design. Please take a look at Example 3 at the very end of the manual page for
results()
, accessible via:If you still require help, I may suggest seeking out a statistician nearby.