I have a similar experiment to this one: Design formula and Design matrix in DESeq
This one might be useful to compare with as well: C: DESEq2 comparison with mulitple cell types under 2 conditions
I have 2 groups for genotype (Mut vs. Ctr), and 4 time points, with 8 biological replicates each. As I am interested in the interactions, I ran my model as you suggested in the manual and in similar posts:
dge <- DESeqDataSetFromMatrix(countData = counts, colData = phenomodel1, design = ~group) dgeWALD <- DESeq(dge)
where group is Ctr_time1, Ctr_time2, Ctr_time3, Ctr_time4, Mut_time1, Mut_time2, Mut_time3, Mut_time4.
This works great in order to extract the genotype effect specifically for each age (example of genotype effect for age 1: Waldresults1 <- results(dgeWALD, contrast=c("group", "Mut_time1", "Ctr_time1")) .
In order to extract the overall effect of genotype (an average of the effect in time 1, 2, 3, and 4), I used the following:
Waldresultsgenotype <- results(dgeWALD, contrast=list(c("Mut_time1", "Mut_time2", "Mut_time3", "Mut_time4"), c("Ctr_time1", "Ctr_time2", "Ctr_time3", "Ctr_time4")), listValues=c(1,-1))
Now I am also interested in the effect of genotype across aging... And the aging effect in general really... And so I am trying to use the likelihood ratio test. However, after reading the manuals and searching for similar posts, I still can't understand how I can run this. So here are my specific questions:
From my understanding, I should run something like this
dgeLRT_genotype_aging <- DESeq(dge, test="LRT", full = ~???, reduced = ~age)
for the effect of genotype accross aging (meaning I take out the effect of age and thus stay with the effect of genotype)
dgeLRT_genotype_aging <- DESeq(dge, test="LRT", full = ~???, reduced = ~genotype)
for the effect of age, independently of genotype (meaning I take out the effect of genotype and thus stay with the effect of aging)
But my full model was ~group (interaction!), so I really don't understand very well how I define the full model and the reduced model. In every case I see the LRT being used the full model is something like ~condition+genotype, but in this example one would be looking at the effect of genotype whilst controlling for the effect of condition, right? Which is not my interest.
Therefore, my question is how exactly should I define (for the LRT):
a) DESeqDataSetFromMatrix (colData and design)
d) DESeq LRT (full and reduced)
If you know of any similar posts to similar studies I would really appreciate if you could help me find them.
Thank you in advance!
As requested (in another post), this is my colData() for the design above:
> colData(dge) DataFrame with 63 rows and 1 column group <factor> A17 A_WT6m A18 E_TG6m A19 B_WT8m A20 F_TG8m A21 C_WT10m ... ... J20 G_TG10m J21 D_WT12m J22 H_TG12m J23 A_WT6m J24 E_TG6m
# where A17, A18, ..., are the IDs for each sample, and A_WT6m, B_WT8m, C_WT10m, D_WT12m, E_TG6m, F_TG8m, G_TG10m, H_TG12m are the groups (I 'merged' being "WT" or "TG" with being "6m", "8m, "10m", or "12m"; they are named A-H for practical purposes when using R).