Hallo,
I have situation pretty similar to C: DESeq2 likelihood ratio test (LRT) design - 2 genotypes, 4 time points, with the only difference of having 3 genotypes (WT, 2 different KO) and 3 time points (Day0, day4, day7)
>experimental_design line day replicate WT_day0_A WT day0 A WT_day0_B WT day0 B WT_day0_C WT day0 C WT_day4_A WT day4 A WT_day4_B WT day4 B WT_day4_C WT day4 C WT_day7_A WT day7 A WT_day7_B WT day7 B WT_day7_C WT day7 C KO_A_day0_A KO_A day0 A KO_A_day0_B KO_A day0 B KO_A_day0_C KO_A day0 C KO_A_day4_A KO_A day4 A KO_A_day4_B KO_A day4 B KO_A_day4_C KO_A day4 C KO_A_day7_A KO_A day7 A KO_A_day7_B KO_A day7 B KO_A_day7_C KO_A day7 C KO_B_day0_A KO_B day0 A KO_B_day0_B KO_B day0 B KO_B_day0_C KO_B day0 C KO_B_day4_A KO_B day4 A KO_B_day4_B KO_B day4 B KO_B_day7_A KO_B day7 A KO_B_day7_B KO_B day7 B KO_B_day7_C KO_B day7 C
I'm interested in :
1) the DEGs among the different genotypes at any given time point
2) the genes whose temporal expression is perturbed by the genotype
for point 1)
As explained in the linked post and the vignette i can use
dds$group <- factor(paste0(dds$line, dds$day))
and then use the contrast parameter in the results function
WaldresultsA <- results(dgeWALD, contrast=c("group", "KO_A_day0", "WT_day0"))
WaldresultsB <- results(dgeWALD, contrast=c("group", "KO_B_day0", "WT_day0"))
WaldresultsAB <- results(dgeWALD, contrast=c("group", "KO_A_day0", "KO_B_day0"))
and I would repeat it for all the time points.
Is there a faster way that I'm missing to avoid doing 9 pairwise comparison independently?
for point 2)
I'm using a LRT test as suggested for time point analysis with full design ~line + day + line:day
and a reduced design ~line + day
that should identify genes with an altered expression pattern across time. So far so good.
I did so with:
ddsMat <- DESeqDataSetFromMatrix(countData = dataset, colData = experimental_design, design = ~line+day+line:day) ddsMatLRT <- DESeq(ddsMat, test = "LRT", reduced = ~line+day) res_LRT_IHW <- results(ddsMatLRT, filterFun=ihw)
What I'm unsure about is how interpret the results:
> head(res_LRT_IHW) log2 fold change (MLE): lineKO_B.dayday7 LRT p-value: '~ line + day + line:day' vs '~ line + day' DataFrame with 6 rows and 7 columns baseMean log2FoldChange lfcSE stat <numeric> <numeric> <numeric> <numeric> ENSMUSG00000051951.5 22.5165766 -1.6236271 0.975098 39.1489484 ENSMUSG00000103377.1 0.8807975 -3.7877553 3.469949 3.4807201 ENSMUSG00000104017.1 0.5717264 -1.2395309 4.924453 0.6416225 ENSMUSG00000103201.1 0.7480610 -1.7446142 8.775683 1.7322564 ENSMUSG00000103161.1 2.0269348 -1.8764214 2.680503 5.6762745 ENSMUSG00000102331.1 5.0319818 0.2647875 2.085465 2.2124272 pvalue padj weight <numeric> <numeric> <numeric> ENSMUSG00000051951.5 6.489969e-08 3.847244e-07 1.7927481 ENSMUSG00000103377.1 4.808159e-01 1.000000e+00 0.0000000 ENSMUSG00000104017.1 9.583281e-01 1.000000e+00 0.0000000 ENSMUSG00000103201.1 7.848502e-01 1.000000e+00 0.0000000 ENSMUSG00000103161.1 2.246638e-01 4.802901e-01 0.8586256 ENSMUSG00000102331.1 6.967548e-01 6.594048e-01 1.7711035
my questions are:
what does this line means?
log2 fold change (MLE): lineKO_B.dayday7
reported fold change only goes for line KO_B at day 7? and compared to? KO_B day4, KO_B day 0 or WT day 7? Does it apply to pvalues too?
How do I know which KO line has the deregulated expression that triggers a low pvalue?
How do I know if this difference was detected in one or more time points?
How do I find which one they are?
Does it need to be done one comparison at a time using the contrast option? how? (I hope not, my idea was exactly to avoid performing an endless list of pairwise comparisons..)
also, I had a look at this
> resultsNames(ddsMatLRT) [1] "Intercept" "line_KO_A_vs_WT" "line_KO_B_vs_WT" [4] "day_day4_vs_day0" "day_day7_vs_day0" "lineKO_A.dayday4" [7] "lineKO_B.dayday4" "lineKO_A.dayday7" "lineKO_B.dayday7"
isn't it missing a day7 vs day4 comparison? I guess not, so most probably I am the one missing something..
I understand that these are probably questions arising from a lack of understanding of how the results()
function works (at least). I tried to follow the vignette, the workflow that is linked in it at the section for time courses, and answers to other questions but I can't wrap my head around it..
PS
For a comparison between the two KO, should I relevel the design on one of them or the comparison is already in the generated table?
Thanks a LOT!
Matteo