I'm something of a neophyte to DESeq2 and want to be sure I'm setting up my analysis appropriately given my sample groups.
First, some background: I have a gene expression dataset which captures five distinct age stages (1 thru 5, 3 reps a piece) as well as 2 clear behavioral states from the earliest and latest age stages (i.e. State A and B from stage 1, 4 reps a piece; State A and B from stage 5, 5 reps a piece). I can thus address questions regarding aging overall as well as the effects of age on behavioral state.
Thus far, I've been splitting this total sample set into separate GLM runs to approach each question independently (i.e. Age: stages 1-5 together in run 1; Behavior: stages 1 and 5, and all behavioral states A and B together in run 2). I am wondering if this is acceptable, or if it wouldn't be more statistically appropriate to combine all samples, from both age and behavior, together in a single GLM, using interaction terms to assign compound conditions to each sample and extracting results from this grander setup.
What follows is an example setup for my 'single-question' GLM addressing age:
expt_design <- data.frame(rows = colnames(total_counts),
condition = c("Time1", "Time1", "Time1", "Time2", "Time2", "Time2", "Time3", "Time3", "Time3", "Time4", "Time4", "Time4", "Time5", "Time5", "Time5"))
dds <- DESeqDataSetFromMatrix( countData = total_counts, colData = expt_design, design = ~ condition)
dds <- DESeq(dds)
res <- results(dds)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
Time1vsTime2 <- results(dds, contrast=c("condition","Time1","Time2"))
Time1vsTime2 <- as.data.frame(Time1vsTime2)
write.csv(Time1vsTime2, "T01_Time1vsTime2_DESeq2test_09202018.csv", row.names=TRUE)
Thanks very kindly in advance!