Question: DEseq2 time-series design.
1
4.6 years ago by
shao80
Germany
shao80 wrote:

Hi,

I am working with time-series RNA-seq data, and would like to make sure I have the right design for my purpose.

The experiments like:

Time = rep(c("0h", "2h", "4h", "6h", "8h"), each = 2)
Treat = rep(c("Control", "Treat"), each = 10)
nameD <- paste(Treat, Time, c(rep(LETTERS[1:2], 5), rep(LETTERS[3:4], 5)), sep = "_")
coldata <- data.frame(row.names = nameD, Time = Time, Treat = Treat)


> coldata
Time   Treat
Control_0h_A   0h Control
Control_0h_B   0h Control
Control_2h_A   2h Control
Control_2h_B   2h Control
Control_4h_A   4h Control
Control_4h_B   4h Control
Control_6h_A   6h Control
Control_6h_B   6h Control
Control_8h_A   8h Control
Control_8h_B   8h Control
Treat_0h_C     0h   Treat
Treat_0h_D     0h   Treat
Treat_2h_C     2h   Treat
Treat_2h_D     2h   Treat
Treat_4h_C     4h   Treat
Treat_4h_D     4h   Treat
Treat_6h_C     6h   Treat
Treat_6h_D     6h   Treat
Treat_8h_C     8h   Treat
Treat_8h_D     8h   Treat



This is longitudinal experiments, two biological replicates for treat and control respectively, and each sample is repeated sequenced in different time.  Besides, treatments started before 0 hour, 0 hour only mean when the samples were sequenced.

I would like to find genes showing difference in any time point between Treat and control. I think the likelihood test is right choice.

The full model I used as

design(dds) <- formula(~ Time + Treat + Time:Treat)

the reduced model I used is different from Vignettes of DESeq2, I used

reduced_1 = formula(~ Time)

rather than

reduced_2 = formula(~ Time + Treat)

I think likelihood test with reduced_2 could not give small pvalue to genes showing consistent difference between treat and control, i.e., genes expression are parallel between two condition.

A few questions:

-  Am I right to use reduced_1? (From the discussion and comments I think it fits my purpose).

- Because it is repeated measurements for each sample, the genes expression also correlate. Would it be a problem for the test?

- I have tried two test so far:

- Setting one (do not consider interaction): full = formula(~ Time + Treat), reduced = formula(~ Time)

- Setting two (consider interaction): full = formula(~ Time + Treat + Time:Treat), reduced = formula(~ Time)

- To my surprise, setting one actually find more DE genes than setting two basing on FDR <= 0.01. I noticed that fold change is small between treat and control, is it connected to the model complexity and lead to less DE genes? or better reasons?

timecourse deseq2 design • 10.0k views
modified 4.1 years ago by lmolla10 • written 4.6 years ago by shao80
Answer: DEseq2 time-series design.
1
4.6 years ago by
Michael Love23k
United States
Michael Love23k wrote:

Hi, Using a reduced formula of ~ time + treat means that genes which show a consistent difference from time 0 onward will not have a small p value. This is what we think you want, because differences between groups at time 0 should be controlled for, e.g. random differences between the individuals chosen for each treatment group which are observable before the treatment takes effect.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Michael Love23k

But cshao has no sample groups in his design table, hence this won't work that way, I think. Hence my question below.

True, it depends on the experiment. Even if not the same samples, another reason to control for time 0 would be: individuals in control group kept in container 1, individuals in treatment group kept in container 2. Context can be more or less important depending on the details of experiment.

Good point, I edited my posts.

I don't think one can control for the sample effect here, because the comparison of interest is across treatment group, which contain different samples. Maybe Simon sees a way though. Given that at time 0, you expect a treatment effect already, I would use your "setting 2" above, with the interaction term in the full model to allow for varying treatment effect at different times, and then a reduced of ~ time

"I noticed that fold change is small between treat and control"

Read the section in ?results which talks about results tables for the likelihood ratio test. Briefly, there is a single likelihood ratio statistic and p-value, but there are many possible log fold changes to show. The results table prints the last one by default, and you need to use the name argument to pull out others. See resultsNames(dds) for all of the log fold changes fit by the DESeq() function.

I don't typically like to focus on the total number of genes with a small FDR; the important thing is to build the correct set for your question of interest.

Thanks. The fold change I talked about is the largest fold change in any time points for each genes, I did the calculation myself. About the second question, i.e., the gene expression correlation among time points, how do you think about it?

Answer: DEseq2 time-series design.
0
4.6 years ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:

Before getting to the actual question, could you please clarify: Are these 16 independent samples, such taht each sample underwent an experiment that was allowed to run until the given time? Or are these four samples, on which intermediate measurements were made (e.g., by taking aliquots, in case it is some liquid cell culture)?

Thanks for the good questions. There are four samples. I missed some important info and have edited my post.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by shao80
Answer: DEseq2 time-series design.
0
4.1 years ago by
lmolla10
United States
lmolla10 wrote:

Hi cshao  ,

I am doing a similar analysis. Can you write the final code that you decided to use here.

I am specifically interested in the design.

From what I understand the better model to use is :

full = formula(~ Time + Treat + Time:Treat), reduced = formula(~ Time)

From this test the genes with small pvalue are those that show a treatment dependent change at any timepoint ?

The proper design depends on your experimental design.

Your full and reduced designs above are probably not what you want, as this would include genes which are DE at time 0. Typically, we would suggest:

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time + treat)

Here is an example:

http://www.bioconductor.org/help/workflows/rnaseqGene/#time

Hi Michael,

Thank you very much for your reply and for being so helpful. My experiment is a bit more complicated but I am indeed interested in changes at time0 as well.

I have a time course experiment and I am also interested in differences at time0. My variables are :

condition=wt and knockdown of protein X (in triplicates)

time=day0, day1,day2

clone=A,B,C,D,E,F [wt=cloneA,B,C; knockdown = clone D,E,F) -> so this means that for clone A I have RNAseq data for time=day0,1,2

time0 is the moment when the cells are induced to differentiate. BUT  the knockdown happened before day0. Thus I am interested in finding protein X dependent changes in any time point including time0 [I expect differences at time0] . I also expect a lot of gene expression changes due to differentiation so I want to be able to determine which gene expression changes are specifically dependent on protein X.

To my understanding the design I need is :

design(dds) <- ~ clone + time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~  clone + time)

I am not sure if I understand the deseq2 design properly but any advice/suggestion would be greatly appreciated.

Thank you!

My sample table:

 clone condition time A KD d0 B KD d0 C KD d0 A KD d1 B KD d1 C KD d1 A KD d2 B KD d2 C KD d2 D control d0 E control d0 F control d0 D control d1 E control d1 F control d1 D control d2 E control d2 F control d2

Linda

hi Linda,

In that case, you can use

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time)

...to generate p-values which test for any difference due to treatment including time=0 (and which allows for differences in treatment at different times).

You can use another design if you want to test treatment effects at each time. See the suggestion here:

A: DESEq2 comparison with mulitple cell types under 2 conditions

With your particular experimental design, you can't control for the clone differences and test for effects due to treatment because they are confounded. In other words, there is no way to disentangle A+B+C effect from the KD effect, because these are exactly the same samples. So you have to leave clone out of the DESeq2 model for this experimental design. You could incorporate the clone information into limma using the duplicateCorrelation function (see the limma package for details).

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Michael Love23k

Thank you very much for your reply!

I am sorry to bother you again but when I run the design above I get :

> resultsNames(dds)
[1] "Intercept"                    "time_d1_vs_d0"                "time_d2_vs_d0"
[4] "condition_KD_vs_control" "timed1.condition.KD"      "timed2.condition.KD"

I want to be able to produce a table where I have all the genes that are differentially expressed due to the KD ( for all time points) and thus I am trying to understand what each of the resultNames(dds) means :

timed1.condition.KD = the affect of KD on timepointd1 ?
timed2.condition.KD = the affect of KD on timepointd2 ?
time_d1_vs_d0 = differentiatially expressed genes between d0 and d1 ?
time_d2_vs_d0 = differentially expressed genes between d0 and d2 ?
"Intercept" = ???? what does this mean
"condition_KD_vs_control" = differentially expressed genes between KD and control at any time point or at day0??

I am not exactly sure if there is a way to make a table for all KD dependent changes regardless of the time point.  If my assumptions above are correct I would do the following...

results(dds, name="condition_KD_vs_control", test="Wald")

Linda

See my answer in another recent post:

A: DESeq2 LRT output help

I know this is an old post, but I have a question regarding the comment "to generate p-values which test for any difference due to treatment including time=0

If you use the formula

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time)

I understand that you are seeing if treatment causes a change in gene expression at any time point, but for the treated term in the formula, aren't you finding differences between all treated vs all untreated? Not specifically treated vs untreated at time 0?

It seems like this could cause genes to come back as significantly different because it would look like the sample size is tripled when its actually just the same sample on a different day.

For example, with my own data, I have 3 time points and 2 conditions. If I just compare treated vs untreated using Wald test at time 0, I get ~10 significantly different genes.

If I use the time series analysis to see which genes change over time due to the treatment,

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time + treat)

I get 0 genes as different.

When I try to see which genes are different at any time point due to treatment,

design(dds) <- ~ time + treat + time:treat
dds <- DESeq(dds, test="LRT", reduced = ~ time)

I now get almost 400 genes as significantly different.

Could this simply be because I've tripled my data set comparing treated to untreated, or is something else going on?

Thanks for your help on this!!

ADD REPLYlink modified 10 months ago • written 10 months ago by CE10

Can you post this as a new question with some example colData? That will help me give the best advice.