Search
Question: Deseq2 - Time-series data with 2 treatments and 1 control
0
8 months ago by

Dear all,

I posted a question (DESeq2 unequal sample sizes) some weeks ago regarding my experiment and sample sizes but received the suggestion to go ask a statistician. I've since struggled with the interpretation of the many LRT models possible and to get some of them to run since the design is not "full rank model". Unfortunately, not everyone in science has the luxury of knowledgeable and helpful statisticians on stand-by, so I'm posting this time with a specific question regarding my setup including more details in the hope that someone will offer some friendly and valuable advice.

I have a time-course experiment with 2 treatment groups and 1 control group. Treatment A has 9 individuals, Treatment B has 5, and Control has 5. Day 0 constitutes the time point before any intervention was done (i.e. both treatment groups and control should be similar at this time point). At days 1, 2, and 3 we expect differences against the control and between treatments.

 Sample Individual Treatment Timepoint x1 1 A Day 0 x2 2 A Day 0 x3 3 A Day 0 x4 4 A Day 0 x5 5 A Day 0 x6 6 A Day 0 x7 7 A Day 0 x8 8 A Day 0 x9 9 A Day 0 x10 10 B Day 0 x11 11 B Day 0 x12 12 B Day 0 x13 13 B Day 0 x14 14 B Day 0 x15 15 Control Day 0 x16 16 Control Day 0 x17 17 Control Day 0 x18 18 Control Day 0 x19 19 Control Day 0 x20 1 A Day 1 ... ... ... ...

(repeated for all 4 time points)

I want to test:

1. A vs Ctrl at each time point
2. B vs Ctrl at each time point

In other words, which genes are different from the controls at day 1, at day 2 and at day 3? I know how to do that with the Wald test and extract results, but the problem then is that the A comparison yields ~40% more significant genes than B solely because it has a higher sample size (i.e. if I remove four individuals from A I get 40% fewer significant genes).

I also want to test:

3. A vs B at each time point.

I.e. how does A differ from B at each day?

So my very first setup in DESeq2 looked like this:

design = ~ individual + timepoint + treatment

Since then, I've been struggling with different models for LRT, and tried to include a "treatment:timepoint" interaction term because other time studies seem to do so. Some yields errors like "not full rank", apparently I think because there are different individuals in each treatment. But I'm still not sure if the LRT is what I need, and if I can use Wald, how do I deal with the different sample sizes? And how do I make sure in the model that day 0 actually constitutes "no treatment" for all groups? Any ideas are most welcome!

modified 8 months ago by Michael Love19k • written 8 months ago by glados0
2
8 months ago by
Michael Love19k
United States
Michael Love19k wrote:

Regarding the point about having more genes with reported DE due to sample size, this is unavoidable for all statistical methods. You can't help that the power is higher when you have more sample size. Removing samples isn't looked upon as a good solution, as you lose information. There isn't a statistical solution to this once you've collected the data and finished the experiment.

Since you don't have access to a local statistician, and the LRT and interaction material in the vignette is not helping clarify the use of these, I think the easiest approach for you is to break into a separate dataset (a separate 'dds') for each day, and just use a design of ~treatment.

Then you can run the following code and build three separate results tables for each day:

dds <- DESeq(dds)
resA <- results(dds, contrast=c("treatment", "A", "Ctrl"))
resB <- results(dds, contrast=c("treatment", "B", "Ctrl"))
resBA <- results(dds, contrast=c("treatment", "B", "A"))

Thank you Michael. Do I lose out on anything by not running interactions or the LRT? I figured with the LTR I will know if genes change over time, but I will get that answer too if I use Wald test for each time point. The sample size still bothers me though, especially since the effect is so large. :/