Question

Design for multi-factor problem with DESeq2

0

Entering edit mode

lu.ne • 0

@lune-19644

Last seen 4.5 years ago

Hi,

I have been reading through DESeq2 related questions, trying to find out if the design I had in mind was suitable, but I could not find a close enough match (or at least that was not obvious to me).

The data I am interested in consists of two subgroups (defined by their genotype, A or B) of individuals to whom a drug or a placebo is administered. There is a before (0) and after (1) time point and we have a response indicator (did the individual die or survive?). Here is a sample of the data (there is more, but I have not included it for the sake of conciseness):

ID	Genotype	Treatment	Time	Response
1	A	None	0	1
1	A	Placebo	1	1
2	A	None	0	1
2	A	Drug	1	1
3	B	None	0	0
3	B	Placebo	1	0
4	B	None	0	1
4	B	Drug	1	1

Main focus of the study
At first, without taking into account the response variable, the questions of interest would be:

How does the genotype affect the gene expression over time? To answer it, I was thinking about using the following design ~ Genotype + Time + ID + Treatment + Genotype:Time + Genotype:Treatment. The idea here is that I expect the effect of genotype on gene expression to be different for Time 0 and Time 1 but also for Placebo and Drug. Then, to answer the question, I was thinking of using an LRT test, the reduced model consisting of the full model minus the interaction term Genotype:Time.
How does the genotype affect the gene expression across the different treatments? I am assuming the design from the previous question combined with an LRT (this time using a reduced model without the Genotype:Treatment interaction term) would be suitable.
At time point 0, what is the difference in gene expression between the genotypes? For Time 0, as this is a baseline time point and no treatment has been administered, a simpler design, such as ~ Genotype could be used on data filtered to keep only Time 0 samples.
At time point 1, what is the difference in gene expression between the genotypes? This would get a bit more complicated for Time 1 as some individuals have received a placebo and others have received the drug. If the data is filtered to keep only Time 1 samples, supposedly, the design ~ Genotype + Treatment + Genotype:Treatment could help in answering the question. This would be followed by an LRT test without the interaction term. As an alternative, combining Treatment and Genotype into a single variable, as described in the Interactions section of the vignette, could be done.

For this first part, I have two questions: is there a way to change the input data so that it makes the problem simpler (such as relabelling the values from the Treatment column)? Would these designs be suitable to answer the questions of interest?

Follow-up question Now there is the matter of adding the response variable to the mix. Two questions I would like to answer are: Is there a difference in gene expression patterns over time for individuals surviving between genotype A and genotype B? Is the difference between survivors and non-survivors in terms of gene expression patterns over time that is specific to the genotype?

These are questions I am still thinking about so would be grateful for leads, such as whether using DESeq2 would help in answering these questions, and more specifically, would any of the above designs be useful to do so?

Thank you.

deseq2 • 928 views

ADD COMMENT • link updated 4.5 years ago by Michael Love 43k • written 4.5 years ago by lu.ne • 0

score 0 · Answer 1 · 2020-10-08

hi,

Re: "Would these designs be suitable to answer the questions of interest?"

Unfortunately, I'm in a position where I only have time to answer software-related questions on the support site, and can't provide statistical analysis guidance or consult. You may therefore want to consult with a local statistician, as you have a fairly complex experimental design, and you'll want to make sure to perform the analysis correctly and interpret the coefficients accurately.

I will note that you will have to consider how to control for ID baselines and have genotype-based interactions, but this is specifically covered in the DESeq2 vignette section on interactions, see the part about controlling for individuals nested within groups.