Multifactor design in DESeq2 - infection effect over time on three different plant cultivars
1
0
Entering edit mode
n85825 • 0
@n85825-15420
Last seen 4.7 years ago

Hi,

I am running DESeq2 to get DEGs from experiment below. The experimental design I am working on has three factors with respective levels:

  • Time (1dpi, 4dpi)
  • Cultivar (J, Q, R)
  • Treatment (infected, control)

time cultivar treatment

1dpi J control

1dpi J control

1dpi J control

4dpi J control

4dpi J control

4dpi J control

1dpi J infected

1dpi J infected

1dpi J infected

4dpi J infected

4dpi J infected

4dpi J infected

1dpi Q control

1dpi Q control

1dpi Q control

4dpi Q control

4dpi Q control

4dpi Q control

1dpi Q infected

1dpi Q infected

1dpi Q infected

4dpi Q infected

4dpi Q infected

4dpi Q infected

1dpi R control

1dpi R control

1dpi R control

4dpi R control

4dpi R control

4dpi R control

1dpi R infected

1dpi R infected

1dpi R infected

4dpi R infected

4dpi R infected

4dpi R infected

Actually, I am interested in DEGs responding to infection over time in cultivars. So, first I used a design of ~ Treatment + Cultivar + Time + Time:Treatment + Cultivar:Treatment, but one of my colleague suggested me to include the three way interaction (Treatment:Cultivar:Treatment) to this design to include all the interaction of the factors.

Now, I am struggling with these issues:

  1. Which of the design formula answer the question (DEGs responding to infection over time in cultivars) I am looking for? design formula above or inclusion of three-way interaction to the formula?

  2. Does factor grouping (considering cultivar and time as one factor called group) work to answer the question? for example, ~ group + treatment + group:treatment

If none of above possibilities work, is there any way to get DEGs responding to infection over time in cultivars.

Any help is greatly appreciated.

Best, Nourolah

quinoa virus mutifactor design DESeq2 • 801 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

I cannot tell you which design is correct, that depend on your assumptions and biological questions and is often beyond the software support that I can provide here. In short if you think that the effect of X can vary Z then you need an additional interaction term of X with Z. It’s really the same as with first order interactions, including in changing the interpretation of coefficients.

In regards to (2), yes these designs are in some ways equivalent. Take a look at the differences in when you build a model matrix with the two approaches to get a sense for the meaning of the coefficients, or consult with a statistical collaborator.

ADD COMMENT
0
Entering edit mode

Thanks Michael for your response.

Is there anyway in DESeq2 to determine the statistical significance of individual factors or their interaction in these type of multifactor design?

Thanks.

ADD REPLY
1
Entering edit mode

Yes, take a look at ?results

ADD REPLY
0
Entering edit mode

My appology for asking another question!

Baesd on above biological question and when I used the code below:

dds <- DESeqDataSetFromTximport(txi, sampletable, design = ~ time + cultivar + treatment + time:treatment + cultivar:treatment + time:cultivar:treatment)

dds$treatment <- relevel(dds$treatment, ref = "control")

the output of resultsNames(dds) is:

> resultsNames(dds)
 [1] "Intercept"                                  "time_4dpi_vs_1dpi"                         
 [3] "cultivar_Q_vs_J"                     "cultivar_R_vs_J"                 
 [5] "treatment_infected_vs_control"              "time4dpi.treatmentinfected"                
 [7] "cultivarQ.treatmentinfected"             "cultivarR.treatmentinfected"         
 [9] "time4dpi.cultivarQ.treatmentcontrol"     "time4dpi.cultivarR.treatmentcontrol" 
[11] "time4dpi.cultivarQ.treatmentinfected"    "time4dpi.cultivarR.treatmentinfected"

1) Why in coefficients [9] and [10] is there control, because I have defined it beforehand as the reference level?

2) And, by looking at the DEGs (below) of for example [12] and [10] it seems the reference level have not been considered in the three-way interaction (please correct me if I am wrong). So, how should these [11] versus [9], and [12] versus [10] be treated if I am going to see the effect of infected versus control?

> res <- results(dds, alpha = 0.06, name = c("time4dpi.cultivarR.treatmentinfected"))
> summary(res)

out of 49206 with nonzero total read count
adjusted p-value < 0.06
LFC > 0 (up)       : 391, 0.79%
LFC < 0 (down)     : 81, 0.16%
outliers [1]       : 104, 0.21%
low counts [2]     : 8540, 17%
(mean count < 1)

> res <- results(dds, alpha = 0.06, name = c("time4dpi.cultivarR.treatmentcontrol"))
> summary(res)

out of 49206 with nonzero total read count
adjusted p-value < 0.06
LFC > 0 (up)       : 22, 0.045%
LFC < 0 (down)     : 31, 0.063%
outliers [1]       : 104, 0.21%
low counts [2]     : 7591, 15%
(mean count < 1)

In advance I appreciate your helps!

ADD REPLY
0
Entering edit mode

This is beyond the scope of the support I can provide here, that is, interpretation of individual terms in a complex design, and how choice of design influence the meaning of these terms.

I'd recommend to meet with a statistician and you can discuss what these terms represent.

ADD REPLY

Login before adding your answer.

Traffic: 736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6