Hi,
I have been reading through DESeq2 related questions, trying to find out if the design I had in mind was suitable, but I could not find a close enough match (or at least that was not obvious to me).
The data I am interested in consists of two subgroups (defined by their genotype, A or B) of individuals to whom a drug or a placebo is administered. There is a before (0) and after (1) time point and we have a response indicator (did the individual die or survive?). Here is a sample of the data (there is more, but I have not included it for the sake of conciseness):
ID | Genotype | Treatment | Time | Response |
---|---|---|---|---|
1 | A | None | 0 | 1 |
1 | A | Placebo | 1 | 1 |
2 | A | None | 0 | 1 |
2 | A | Drug | 1 | 1 |
3 | B | None | 0 | 0 |
3 | B | Placebo | 1 | 0 |
4 | B | None | 0 | 1 |
4 | B | Drug | 1 | 1 |
Main focus of the study
At first, without taking into account the response variable, the questions of interest would be:
How does the genotype affect the gene expression over time? To answer it, I was thinking about using the following design
~ Genotype + Time + ID + Treatment + Genotype:Time + Genotype:Treatment
. The idea here is that I expect the effect of genotype on gene expression to be different for Time 0 and Time 1 but also for Placebo and Drug. Then, to answer the question, I was thinking of using an LRT test, the reduced model consisting of the full model minus the interaction term Genotype:Time.How does the genotype affect the gene expression across the different treatments? I am assuming the design from the previous question combined with an LRT (this time using a reduced model without the Genotype:Treatment interaction term) would be suitable.
At time point 0, what is the difference in gene expression between the genotypes? For Time 0, as this is a baseline time point and no treatment has been administered, a simpler design, such as
~ Genotype
could be used on data filtered to keep only Time 0 samples.At time point 1, what is the difference in gene expression between the genotypes? This would get a bit more complicated for Time 1 as some individuals have received a placebo and others have received the drug. If the data is filtered to keep only Time 1 samples, supposedly, the design
~ Genotype + Treatment + Genotype:Treatment
could help in answering the question. This would be followed by an LRT test without the interaction term. As an alternative, combining Treatment and Genotype into a single variable, as described in the Interactions section of the vignette, could be done.
For this first part, I have two questions: is there a way to change the input data so that it makes the problem simpler (such as relabelling the values from the Treatment column)? Would these designs be suitable to answer the questions of interest?
Follow-up question Now there is the matter of adding the response variable to the mix. Two questions I would like to answer are: Is there a difference in gene expression patterns over time for individuals surviving between genotype A and genotype B? Is the difference between survivors and non-survivors in terms of gene expression patterns over time that is specific to the genotype?
These are questions I am still thinking about so would be grateful for leads, such as whether using DESeq2 would help in answering these questions, and more specifically, would any of the above designs be useful to do so?
Thank you.
Thanks, Michael for answering and for providing some pointers.