Question: Time-series analysis with DESeq2
0
7 months ago by
Assa Yeroslaviz1.4k
Munich, Germany
Assa Yeroslaviz1.4k wrote:

I was wondering how one can do a deseq2 analysis of multiple time points when trying to test for a WT vs TREAT difference at any of the given times. Let's say I have three time points (1h,2h,3h) and two conditions (WT, TREAT)

In edgeR one can create a contrast matrix and pass the complete matrix to the glm() function.

con<- makeContrasts(
TP1.WTvs.T  = T1.T - T1.WT,
TP2.WTvs.T  = T2.T - T2.WT,
TP3.WTvs.T  = T3.T - T3.WT, levels=design)

glmQLFTest(fit, contrast = TP1.WTvs.T) # single TP comparison
glmQLFTest(fit, contrast = con)        # compare all time points in one go.


With this contrast matrix i can call each contrast separately to detect genes differentially expressed in each of the single time-points or to compare changes in all TP.

I'm interested to understand how this can be done using the DESeq2 package.

I know i can use the interaction terms after running the results() command. something like:

 resultsNames(ddsTC)

##  [1] "Intercept"           "strain_T_vs_WT"    "time_TP1_vs_TP0"      "time_TP2_vs_TP0"
##  [5] "..."      "..."     "..."     "strainT.timeTP1"
##  [9] "strainT.timeTP2"  ...


By calling results(dds, name="time_TP2_vs_TP0") I can detect all genes in WT whcih ae changed between TP2 and TP0.

The call for results(dds, name="strainT.timeTP1") will test if the Treat vs WT fold change is different at TP1 than at TP0.

Where i'm not sure is by following this logic, how can I do the same I did with edger? How do I identify genes which are changed across all three time points?

thanks Assa

modified 7 months ago by Michael Love25k • written 7 months ago by Assa Yeroslaviz1.4k
1
7 months ago by
Michael Love25k
United States
Michael Love25k wrote:

What you want to do is described in the time series example in the workflow. You can perform an LRT as shown there.

thanks Michael for the fast response.

so transferring it from the workflow, something like this should do it for me?

ddsTC <- DESeqDataSet(counts, ~ strain + time + strain:time)
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + time)


Does this apply only for cases where I have separate WT and treated samples for each of the time points?

Yes.

Yes, you need WT and treated samples at every time point.

great. Is there an interaction where I can look for genes with significant changes in all time points? I mean a constant trendover all time points>

great. Is there an interaction where I can look for genes with significant changes in all time points? I mean a constant trend over all time points>

great. Is there an interaction where I can look for genes with significant changes in all time points? I mean a constant trend over all time points?

There's not really a term for that. You may want to meet with a statistician if you have further questions about how the linear model coefficients in a two group time series work.

Let's say I have this sample information

          condtion timepoint replicate  group
Sample_1        WT        0h         1  WT_0h
Sample_2        WT        0h         2  WT_0h
Sample_3        WT        0h         3  WT_0h
Sample_4        WT        1h         1  WT_1h
...
Sample_10       WT        8h         1  WT_8h
Sample_11       WT        8h         2  WT_8h
Sample_12       WT        8h         3  WT_8h
Sample_13      KO1        0h         1 KO1_0h
Sample_14      KO1        0h         2 KO1_0h
...
Sample_45      KO3        4h         3 KO3_4h
Sample_46      KO3        8h         1 KO3_8h
Sample_47      KO3        8h         2 KO3_8h
Sample_48      KO3        8h         3 KO3_8h


And I'm analyzing this with Wald and LRT to get the following interaction terms

 dds <- DESeqDataSetFromMatrix(countData = counts,
design = ~ group)
dds <- DESeq(object = dds, test = "Wald" )

res1 <- results(object = dds, contrast=c("group", "KO1_0h" ,"WT_0h"), alpha = 0.001)
res2 <- results(object = dds, contrast=c("group", "KO2_0h" ,"WT_0h"), alpha = 0.001)
res3 <- results(object = dds, contrast=c("group", "KO3_0h" ,"WT_0h"), alpha = 0.001)
...


and than again

dds.lrt <- DESeqDataSetFromMatrix(countData = counts,
design = ~ condtion + timepoint + condtion:timepoint)

dds.lrt <- DESeq(object = dds.lrt, test = "LRT", reduced = ~ condtion + timepoint)

resultsNames(dds.lrt)
[1] "Intercept"               "condtion_KO1_vs_WT"      "condtion_KO2_vs_WT"      "condtion_KO3_vs_WT"      "timepoint_1h_vs_0h"      "timepoint_4h_vs_0h"
[7] "timepoint_8h_vs_0h"      "condtionKO1.timepoint1h" "condtionKO2.timepoint1h" "condtionKO3.timepoint1h" "condtionKO1.timepoint4h" "condtionKO2.timepoint4h"
[13] "condtionKO3.timepoint4h" "condtionKO1.timepoint8h" "condtionKO2.timepoint8h" "condtionKO3.timepoint8h"

res.lrt.1 <- results(object = dds.lrt, name = "condtionKO1.timepoint1h", alpha = 0.001)


In the above test I compare the timepoints and conditions on a pair-wise basis. In the second one I use the LRT test model. "condtion_KO1_vs_WT" gives me genes changed over all timepoints between KO1 and WT, "condtionKO1.timepoint1h" would give me the differences between KO1 vs WT at TP1, controlling for baseline.

What would be the difference between res1 and res.lrt.1? and more importantly, which of the two options would make more sense to analyze?

1

Hi Assa,

I would recommend at this point that you consult with a statistician. I don't have unlimited time for the support site, and so I need to restrict myself to software related questions. Your questions are important, but they have to do with linear modeling, and how different designs answer different questions. If the documentation and workflows we have is not sufficient to bridge the gap, what you would benefit from is to sit down with a statistician or anyone with experience in linear modeling, and they can explain how different coefficients work in a linear model.