Hello,
I have a fair amount of experience using DESeq for relatively simple study designs (ie: ~ batch + genotype to look for effect of genotype while controlling for batch) but I am having difficulties deciding on the best formula to address more complicated questions.
Specifically, I am using DESeq to analyze changes in gene expression for 16 samples, with the following design structure:
genotype batch immortalized
kko1_new KKO new yes
kko2_new KKO new yes
kko1_old KKO old no
kko2_old KKO old no
wt1_new WT new no
wt2_new WT new no
wt1_old WT old no
wt2_old WT old no
P4D7mDox non new no
P4C11mDox non new no
P4B3mDox non new no
P3E12mDox non new no
P4D7pDox ind new no
P4C11pDox ind new no
P4B3pDox ind new no
P3E12pDox ind new no
In this case, genotype represents either cells: 1) KO for my gene of interest (KKO) 2) WT for my gene of interest (WT) 3) rescued (non) 4) O.E. (ind)
Ultimately, I want to know how gene expression varies across genotype while controlling for 2 different sources of noise. Ie: How does my GOI regulate gene expression? The first source of noise is batch effects due to experiments (4 samples were prepared initially: "old"; 12 samples were prepared later "new"), and the second is immortalization status. In between "old" and "new" experiments, the KKO cell line underwent a dramatic change in morphology such that I believe there is additional noise that was introduced in addition to variation due to batch effect, which is why I feel that I can't simply resolve this issue by comparing gene expression across "new" samples only.
In this case, would a design formula such as below work, or is it too simple for the question I want to ask? Would it also be beneficial to add an interaction term?
design = ~ batch + immortalized + genotype
Or would it be more appropriate to try an LRT comparison? If so, something like the following?
design = ~batch + immortalized + genotype
gdds_LRT <- DESeq(gdds_LRT, test="LRT", reduced = ~ batch + immortalized)
Any help is greatly appreciated.