I am working with time-series RNA-seq data, and would like to make sure I have the right design for my purpose.
The experiments like:
Time = rep(c("0h", "2h", "4h", "6h", "8h"), each = 2) Treat = rep(c("Control", "Treat"), each = 10) nameD <- paste(Treat, Time, c(rep(LETTERS[1:2], 5), rep(LETTERS[3:4], 5)), sep = "_") coldata <- data.frame(row.names = nameD, Time = Time, Treat = Treat)
> coldata Time Treat Control_0h_A 0h Control Control_0h_B 0h Control Control_2h_A 2h Control Control_2h_B 2h Control Control_4h_A 4h Control Control_4h_B 4h Control Control_6h_A 6h Control Control_6h_B 6h Control Control_8h_A 8h Control Control_8h_B 8h Control Treat_0h_C 0h Treat Treat_0h_D 0h Treat Treat_2h_C 2h Treat Treat_2h_D 2h Treat Treat_4h_C 4h Treat Treat_4h_D 4h Treat Treat_6h_C 6h Treat Treat_6h_D 6h Treat Treat_8h_C 8h Treat Treat_8h_D 8h Treat
This is longitudinal experiments, two biological replicates for treat and control respectively, and each sample is repeated sequenced in different time. Besides, treatments started before 0 hour, 0 hour only mean when the samples were sequenced.
I would like to find genes showing difference in any time point between Treat and control. I think the likelihood test is right choice.
The full model I used as
design(dds) <- formula(~ Time + Treat + Time:Treat)
the reduced model I used is different from Vignettes of DESeq2, I used
reduced_1 = formula(~ Time)
reduced_2 = formula(~ Time + Treat)
I think likelihood test with reduced_2 could not give small pvalue to genes showing consistent difference between treat and control, i.e., genes expression are parallel between two condition.
A few questions:
- Am I right to use reduced_1? (From the discussion and comments I think it fits my purpose).
- Because it is repeated measurements for each sample, the genes expression also correlate. Would it be a problem for the test?
- I have tried two test so far:
- Setting one (do not consider interaction): full = formula(~ Time + Treat), reduced = formula(~ Time)
- Setting two (consider interaction): full = formula(~ Time + Treat + Time:Treat), reduced = formula(~ Time)
- To my surprise, setting one actually find more DE genes than setting two basing on FDR <= 0.01. I noticed that fold change is small between treat and control, is it connected to the model complexity and lead to less DE genes? or better reasons?
Thanks in advance.