Question

how to interpret results from 3way design in DESeq2

0

Entering edit mode

demisa • 0

@demisa-22970

Last seen 3.0 years ago

United States

I'm trying to figure out how to interpret a model design with 3 factors plus interactions.

I have young and old mice, untreated and treated, in two timepoints. I'm interested to see the treatment effect and the age effect.

> head(sample.summary)
  sample   age treatment timepoint
1     s1 young untreated       1
2     s2 young   treated       1
3     s3 young untreated       3
4     s4 young   treated       3
5     s5   old untreated       1
6     s6   old   treated       1
7     s7   old untreated       3
8     s8   old   treated       3

I am treating it as a time-series analysis, with 3 categorical factors plus interactions.

dds <- DESeqDataSetFromTximport(all.txi.kallisto, sample.summary, ~ age*treatment*timepoint)
dds <- DESeq(dds, test = "LRT", reduced = ~ age+treatment+timepoint)
resultsNames(dds)

[1] "Intercept"                          "age_young_vs_old"                   "treatment_treated_vs_untreated"              
[4] "timepoint_3_vs_1"               "ageyoung.treatmenttreated"              "ageyoung.timepoint3"             
[7] "treatmenttreated.timepoint3"          "ageyoung.treatmenttreated.timepoint3"

1) Do just two timepoints qualify the analysis as a time-series?

2) What is the best practice? One model with 3 factors (like above) or four models with 2-factors (one model for young, one for old, one for untreated, one for treated)?

3) How should we interpret the resultsNames?

"treatmenttreatedvs_untreated" is the "treatment effect for old in 1mo" or the "treatment effect for old"?
"ageyoung.treatmenttreted" is the "treatment effect for young compared to old, in 1mo" or the "treatment effect for young compared to old"?
"treatmenttreated.timepoint3" is the "timepoint effect for treated compared to untreated" or "timepoint effect for treated compared to untreated, in old" ?
"ageyoung.treatmenttreated.timepoint3" is the "timepoint effect for treated vs untreated, and young vs old"?

4) To get the total treatment effect, I would need to combine all the comparisons that contain the treatment factor, correct? So, "treatmenttreatedvs_untreated", "ageyoung.treatmenttreated", "treatmenttreated.timepoint3", "ageyoung.treatmenttreated.timepoint3"

5) How would I get the "treatment effect for old in 3mo" or the "age effect for untreated in 0mo" and so on?

The head of my sessionInfo is:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
other attached packages: DESeq2_1.24.0

Any insight or material that could help me unpuzzle would be so greatly appreciated! Thank you!

deseq2 • 755 views

ADD COMMENT • link updated 4.2 years ago by Michael Love 41k • written 4.2 years ago by demisa • 0

score 0 · Answer 1 · 2020-02-22

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

First check out the diagram in the DESeq2 vignette in the interaction section to get an idea how the interaction terms change the interpretation of the main effects. There is also some useful examples in ?results.

The same principle applies when you do ~x*y*z in a design, which adds first-order interaction terms between all pairs of variables, and then one second-order interaction between all three variables.

The choice of which model to run is up to you, and I don't have sufficient time to give design guidance per question unfortunately. However, once you have a sense for what the coefficients represent (e.g. see diagram), then you can remove individual terms with the LRT. There are various LRT you can perform by removing terms from the design. It may help in your case to note that ~x*y*z can be written out in long form as x + y + z + x:y + x:z + y:z + x:y:z. I always recommend that users use this formulation instead of the shorthand, so that it's clear what terms are being used in the model.

ADD COMMENT • link 4.2 years ago Michael Love 41k

0

Entering edit mode

Thank you so much for your time Mike. I have been reading these resources again and again, but my puzzling persists. The basis of my confusion is how the introduction of the third factor changes the interpretation of each comparison. For example:

For a design with one factor (~ treatment), we see the treatment effect on gene expression.

If we add one more factor (~ age + treatment), we see the treatment effect on expression and the age effect on expression.

If we add their interaction (~ age + treatment + age:treatment), then the main effect (treatmenttreatedvs_untreated) changes by adding the age control; and we see the treatment effect for the "old" level of age. Which makes me interpret it like each effect (resultName) is a combination of both factors.

If this is correct then if I add one more factor plus all the interactions (~ treatment + age + timepoint + treatment:age + treatment:timepoint + age:timepoint + treatment:age:timepoint), then all the effects will be a combination of all three factors. So for example, "treatmenttreatedvs_untreated" is the treatment effect for old in 1mo, because old is the control level for age and 1mo is the control level for timepoint. Does this make sense?

PS: Just to make sure I haven't missed sth, this is the vignette you are referring to, correct?

ADD REPLY • link 4.2 years ago demisa • 0

0

Entering edit mode

I'd recommend meeting with a statistician or someone familiar with linear models to discuss the interpretation of the coefficients in a design with second order interactions. It's not very easy to explain in a comment box / via email, but really helps from a face-to-face meeting.

Re: vignette, that one is very old. To find the most up-to-date vignette for a particular package, go to https://bioconductor.org/pacakges/DESeq2 and then click on HTML under the header Documentation. Or from within R you can do vignette("DESeq2").

ADD REPLY • link 4.2 years ago Michael Love 41k