Hello! I am having a difficult time with the design element of my Deseq2 timecourse following the vignette in (Love MI, Anders S, Kim V and Huber W. RNA-Seq workflow: gene-level exploratory analysis and differential expression [version 1; referees: 2 approved]. F1000Research 2015, 4:1070 (doi: 10.12688/f1000research.7035.1)) as a template.
My experiment is looking at Controls (wild type mice) Vs Experimental (knock out) with 4 biological replicates each, at 4 time points (p0, p7, p15, p30) These are not repeated measures on the same group of mice (each mouse was sac'ed at each timepoint). My hope is to look at condition specific differences over time. Here is what I have so far:
sampleFiles<- c("CB-1712-P0-M-KO.counts.processed.txt",
"CB-1713-P0-M-WT.counts.processed.txt",
"CB-1741-P0-M-WT.counts.processed.txt",
"CB-1620-P0-M-WT.counts.processed.txt",
"CB-1650-P0-M-KO.counts.processed.txt",
"CB-1700-P0-M-WT.counts.processed.txt",
"CB-1702-P0-M-KO.counts.processed.txt",
"CB-1704-P0-M-KO.counts.processed.txt",
"CB-1171-P7-M-WT.counts.processed.txt",
"CB-1185-P7-M-KO.counts.processed.txt",
"CB-1191-P7-M-KO.counts.processed.txt",
"CB-1330-P7-M-WT.counts.processed.txt",
"CB-1370-P7-M-WT.counts.processed.txt",
"CB-1380-P7-M-WT.counts.processed.txt",
"CB-1381-P7-M-KO.counts.processed.txt",
"CB-1414-P7-M-KO.counts.processed.txt",
"CB-1547-P15-M-WT.counts.processed.txt",
"CB-1210-P15-M-KO.counts.processed.txt",
"CB-1220-P15-M-WT.counts.processed.txt",
"CB-1223-P15-M-KO.counts.processed.txt",
"CB-1231-P15-M-KO.counts.processed.txt",
"CB-1233-P15-M-WT.counts.processed.txt",
"CB-1540-P15-M-KO.counts.processed.txt",
"CB-1542-P15-M-WT.counts.processed.txt",
"CB-CM847-P30-M-WT.counts.processed.txt",
"CB-1255-P30-M-KO.counts.processed.txt",
"CB-1265-P30-M-WT.counts.processed.txt",
"CB-1266-P30-M-KO.counts.processed.txt",
"CB-1341-P30-M-KO.counts.processed.txt",
"CB-CM838-P30-M-WT.counts.processed.txt",
"CB-CM839-P30-M-KO.counts.processed.txt",
"CB-CM846-P30-M-WT.counts.processed.txt")
sampleNames <- c("CB-1712-P0-M-KO",
"CB-1713-P0-M-WT",
"CB-1741-P0-M-WT",
"CB-1620-P0-M-WT",
"CB-1650-P0-M-KO",
"CB-1700-P0-M-WT",
"CB-1702-P0-M-KO",
"CB-1704-P0-M-KO",
"CB-1171-P7-M-WT",
"CB-1185-P7-M-KO",
"CB-1191-P7-M-KO",
"CB-1330-P7-M-WT",
"CB-1370-P7-M-WT",
"CB-1380-P7-M-WT",
"CB-1381-P7-M-KO",
"CB-1414-P7-M-KO",
"CB-1547-P15-M-WT",
"CB-1210-P15-M-KO",
"CB-1220-P15-M-WT",
"CB-1223-P15-M-KO",
"CB-1231-P15-M-KO",
"CB-1233-P15-M-WT",
"CB-1540-P15-M-KO",
"CB-1542-P15-M-WT",
"CB-CM847-P30-M-WT",
"CB-1255-P30-M-KO",
"CB-1265-P30-M-WT",
"CB-1266-P30-M-KO",
"CB-1341-P30-M-KO",
"CB-CM838-P30-M-WT",
"CB-CM839-P30-M-KO",
"CB-CM846-P30-M-WT")
Genotype <- c("Exp","Control","Control","Control","Exp",
"Control","Exp","Exp","Control","Exp","Exp","Control",
"Control","Control","Exp","Exp","Control","Exp",
"Control","Exp","Exp","Control","Exp","Control",
"Control","Exp","Control","Exp","Exp","Control",
"Exp","Control")
time<- c("0","0","0","0","0","0","0","0","7","7","7","7","7","7","7","7",
"15","15","15","15","15","15","15","15",
"30","30","30","30","30","30","30","30")
sampleTable<-data.frame(sampleName=sampleNames, fileName=sampleFiles, condition=Genotype, time=time)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~ condition + time + condition:time)
nrow(ddsHTSeq)
keep <- rowSums(counts(ddsHTSeq)) > 10
dds <- ddsHTSeq[keep,]
nrow(dds)
dds$condition <- relevel(dds$condition, ref = "Control")
ddsTC <- DESeq(dds, test="LRT", reduced = ~ condition + time)
resTC <- results(ddsTC)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),],4)
resultsNames(ddsTC)
"Intercept" "condition_Exp_vs_Control" "time_15_vs_0"
[4] "time_30_vs_0" "time_7_vs_0" "conditionExp.time15"
[7] "conditionExp.time30" "conditionExp.time7"
This seems to not be comparing what I want it to. I would like it to compare controls vs exp with time as an interaction term but it seems that it is not grouping the analysis the way I would like. Any help would be greatly appreciated!
Thanks for your quick response! Is there a part of the manual/vignette that explains the output of "resultsNames(ddsTC)" so I can better understand what comparison each header is referring to? Thanks
The vignette and workflow have quite a bit of information about how to interpret results, there is an interaction section in the vignette.
But I would not recommend to attempt to use interaction models without any background on what interactions do in linear models. At some point, we have to assume that the user is aware of what the model terms are doing. It's much better to consult with a statistical collaborator to make sure you are interpreting the results correctly.