Question

Best formula to use for this experimental setup

0

Entering edit mode

bdg • 0

@bdg-8286

Last seen 10.6 years ago

Singapore

Hi,

I have a experimental setup for a time-series as follows:

Treatment: time1 time2 time3 time4 time5

Control: time0 time4

So I have a timeseries, where after time 0 the cells are treatment and then there is RNA-seq at a set of time points after. Control at time 4 is there to look to see if there is any circadian changes w.r.t to the time course (this is not the main aim of the experiment). And I am trying to look for genes which change as a change over time in response to treatment. So if I ignore the control sample time4 this resolves down to the simple ANOVA/LRT with the full model as ~ time and the reduced model as ~1.

A simple pair-wise differential expression between control at time0 and control at time 4 reveals only a few differentially expressed genes (and they're all circadian cycle related). And these samples clusters strongly with the samples at control:time0.

Is it ok for me to merge these samples with the samples at time 0? Is there any problems that will arise from this or any further checks that I need to perform before I can definitely collapse this group down?

However, I feel like this may not be optimal as I'm throwing away information from control at time4 - which is some ways probably means I'm overestimating the variance for some of the genes by combining them using the method suggested above. Is there a better model which can take this into account that I am not considering?

Many thanks for your help,

deseq2 differential gene expression design • 1.2k views

ADD COMMENT • link updated 10.6 years ago by Michael Love 43k • written 10.6 years ago by bdg • 0

score 1 · Answer 1 · 2015-06-28

The 'full' design must accommodate that each group have it's own fitted values (all groups can have differences from each other due to either time, treatment or the interaction of time and treatment). How to formulate this is a small matter, but first we must resolve a bigger question:

What should the null model be? It makes sense that the time4 samples (treated and control) would be in the same group, under the assumption that the treatment has no effect. But should the time1-time3, and time5 samples be grouped with time0, and the time4 samples in a second group? Or should the time5 samples group with the time4 samples, and the time0-time3 samples be in second group? Or should the null be that all the samples are in the same group (a design of ~1), so that we ignore time-based differences between the two groups of control samples (time0 and time4)?

Without control samples at each time point, there are many possibilities for the null, but I'm not sure which makes the most sense for your system.