Question

Picking the right design formula

0

Entering edit mode

donnenfieldjonah • 0

@donnenfieldjonah-23963

Last seen 3.4 years ago

United States

Hello everyone,

I am trying to pick the right design formula and would appreciate advice. Here is my experiment set up:

Week 0: controls (N=6)

Week 1: Treatment A (N=6), Treatment B (N=6), Treatment C (N=6)

Week 4: Treatment A (N=6), Treatment B (N=6), Treatment C (N=6)

In total, there are 42 independent subjects because sample retrieval required sacrificing the animals.

I will list what comparisons I would like and the design formula I have used thus far:

All variables shown below are binary dummy variables.

#What genes are differentially expressed between Week 1 and controls?
design = ~ Week_4 + Week_1

#What genes are differentially expressed between Week 4 and controls?
design = ~ Week_1 + Week_4

#What genes are differentially expressed between Week 1 Treatment A animals and controls (that aren't simply due to being in Week 1)? This could also be thought of as: How does Treatment A modify being in Week 1?
design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:TreatmentA

#What genes are differentially expressed between Week 1 Treatment B animals and controls (that aren't simply due to being in Week 1)? This could also be thought of as: How does Treatment B modify being in Week 1?
design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:Treatment_B

#What genes are differentially expressed between Week 1 Treatment C animals and controls (that aren't simply due to being in Week 1)? This could also be thought of as: How does Treatment C modify being in Week 1?
design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:Treatment_C

#What genes are differentially expressed between Week 1 Treatment A animals and Week 1 Treatment B animals?
No clue how to do this.

Any help would be much appreciated.

As a side note, I began with the following design formula:

design = ~ groups

Where "groups" took the form of:

Control, Control, Control, Control, Control, Control, 1A, 1A, 1A, 1A, 1A, 1A, 1B, 1B, 1B, 1B, 1B, 1B, 1C, 1C, 1C, 1C, 1C, 1C, 4A, 4A, 4A, 4A, 4A, 4A, 4B, 4B, 4B, 4B, 4B, 4B, 4C, 4C, 4C, 4C, 4C, 4C

But then I realized this would not provide me with results such as how Week 1 Treatment A vs. Control is distinct from Week 1 vs. Control.

Thank you in advance for any help! And I look forward to us solving this!

deseq2 design experiment • 1.8k views

ADD COMMENT • link 5.4 years ago donnenfieldjonah • 0

score 0 · Answer 1 · 2020-09-12

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 22 hours ago

San Diego

You need to start by reading some tutorials and vignettes. That's not how you write designs in DESeq. Also, since you have no week 1 control, I don't think you can separate changes due to treatment from changes due to time. You just can't answer that, no matter how clever the algorithm.

ADD COMMENT • link 5.4 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Hi swbarnes2, From a GLM perspective, I think it is possible to parse these effects. Take the following equation:

design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:TreatmentA

This formula controls for Week and Treatment and is looking at how TreatmentA modifies the Week1 effect.

Compare it to the following:

design = ~ Week_4 + Week_1

This formula^ is just looking at the effect of Week1 while controlling for the other Week timepoints (Week0 is not listed because that would lead to multicollinearity issues).

The difference between these two equations is that one looks just at Week1 while the other looks at how TreatmentA impacts the Week_1 effect.

Maybe I am missing something, but I think that makes sense.

ADD REPLY • link 5.4 years ago donnenfieldjonah • 0

1

Entering edit mode

It looks like you should have just two variables in your design, likely Treatment and Time. This would make your problem a lot easier. Can you please go through the DESeq2 vignette and then come back if there are still questions?

Your sample metadata, 'colData', could look something like:

  Treatment     Time
  A             Week1
  A             Week1
  A             Week1
  ...           ...
  B             Week1
  B             Week2
  B             Week2
  ...           ...
  C             Week2
  C             Week3
  C             Week3
  ...           ...

Design formulae like this are confusing:

design = ~ Week_1 + Week_4 + Treatment_A + Treatment_B + Treatment_C + Week_1:TreatmentA

For example, what are the values of Treatment_A and Treatment_B ? It is implied that these are columns in your metadata, but what values do they contain?

ADD REPLY • link 5.4 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thank you for your response, Kevin. Let me try to clarify.

To answer your question, TreatmentA and TreatmentB are dummy variables with 1's and 0's in my metadata. But let me back up and try to restate my question a bit more clearly. I truly appreciate you bearing with me in this process.

I originally used the following design formula (I also controlled for batch, sex, etc, but I am not including that here):

design = ~ groups

where groups is the following in my metadata:

groups
control
control
control
control
control
control
A4
A4
A4
A4
A4
A4
A1
A1
A1
A1
A1
A1
B4
B4
B4
B4
B4
B4
B1
B1
B1
B1
B1
B1
C4
C4
C4
C4
C4
C4
C1
C1
C1
C1
C1
C1

Importantly, "groups" is a combined variable of both treatment and week. This is recommended by the DESeq2 vignette for assessing interactions, which is part of what I am trying to assess. I want to assess the following: 1) All 1 week subjects (A1, B1, C1) vs. controls 2) 1 week with the modifying effect of Treatment A vs. controls

I want to assess this interaction (#2 above) because I want to know the specific DEGs for Treatment A at 1 week vs. controls that are not simply due to the 1 week vs. controls comparison. That is, I want to know which genes uniquely contribute to the A1 vs. controls comparison that are not due to the 1 week vs. control comparison.

In order to get both of these assessments (#1 and #2), I'm fairly certain I have to make multiple design formulas <-- which feels very very unconventional and is the reason for my post.

I realize that most analyses use one design formula, but how else am I supposed to assess both an interaction term and a non-interaction term for DEGs?

ADD REPLY • link 5.4 years ago donnenfieldjonah • 0

1

Entering edit mode

So, you were actually manually creating a model matrix, but I think that you are implementing it incorrectly. The model.matrix() function just allows you to better understand which coefficients to use, but we construct this matrix like this:

model.matrix(~ Treatment + Time + Treatment:Time)

The formula used in DESeq2 is still then ~ Treatment + Time + Treatment:Time

Irrespective, your first part should be easy to do via the group variable.

For your second part, I think that it is possible via the standard interaction design. Please take a look at Example 3 at the very end of the manual page for results(), accessible via:

?results

If you still require help, I may suggest seeking out a statistician nearby.

ADD REPLY • link 5.4 years ago Kevin Blighe ★ 4.0k