Question

Deseq2 - Time-series data with 2 treatments and 1 control

0

Entering edit mode

MrHodor • 0

@hodor-7266

Last seen 3.1 years ago

Dear all,

I posted a question (DESeq2 unequal sample sizes) some weeks ago regarding my experiment and sample sizes but received the suggestion to go ask a statistician. I've since struggled with the interpretation of the many LRT models possible and to get some of them to run since the design is not "full rank model". Unfortunately, not everyone in science has the luxury of knowledgeable and helpful statisticians on stand-by, so I'm posting this time with a specific question regarding my setup including more details in the hope that someone will offer some friendly and valuable advice.

I have a time-course experiment with 2 treatment groups and 1 control group. Treatment A has 9 individuals, Treatment B has 5, and Control has 5. Day 0 constitutes the time point before any intervention was done (i.e. both treatment groups and control should be similar at this time point). At days 1, 2, and 3 we expect differences against the control and between treatments.

`Sample`	`Individual`	`Treatment`	`Timepoint`
`x1`	`1`	`A`	`Day 0`
`x2`	`2`	`A`	`Day 0`
`x3`	`3`	`A`	`Day 0`
`x4`	`4`	`A`	`Day 0`
`x5`	`5`	`A`	`Day 0`
`x6`	`6`	`A`	`Day 0`
`x7`	`7`	`A`	`Day 0`
`x8`	`8`	`A`	`Day 0`
`x9`	`9`	`A`	`Day 0`
`x10`	`10`	`B`	`Day 0`
`x11`	`11`	`B`	`Day 0`
`x12`	`12`	`B`	`Day 0`
`x13`	`13`	`B`	`Day 0`
`x14`	`14`	`B`	`Day 0`
`x15`	`15`	`Control`	`Day 0`
`x16`	`16`	`Control`	`Day 0`
`x17`	`17`	`Control`	`Day 0`
`x18`	`18`	`Control`	`Day 0`
`x19`	`19`	`Control`	`Day 0`
`x20`	`1`	`A`	`Day 1`
`...`	`...`	`...`	`...`

(repeated for all 4 time points)

I want to test:

A vs Ctrl at each time point
B vs Ctrl at each time point

In other words, which genes are different from the controls at day 1, at day 2 and at day 3? I know how to do that with the Wald test and extract results, but the problem then is that the A comparison yields ~40% more significant genes than B solely because it has a higher sample size (i.e. if I remove four individuals from A I get 40% fewer significant genes).

I also want to test:

3. A vs B at each time point.

I.e. how does A differ from B at each day?

So my very first setup in DESeq2 looked like this:

design = ~ individual + timepoint + treatment

Since then, I've been struggling with different models for LRT, and tried to include a "treatment:timepoint" interaction term because other time studies seem to do so. Some yields errors like "not full rank", apparently I think because there are different individuals in each treatment. But I'm still not sure if the LRT is what I need, and if I can use Wald, how do I deal with the different sample sizes? And how do I make sure in the model that day 0 actually constitutes "no treatment" for all groups? Any ideas are most welcome!

deseq2 • 2.8k views

ADD COMMENT • link updated 6.4 years ago by Michael Love 41k • written 6.4 years ago by MrHodor • 0

score 2 · Accepted Answer · 2017-12-07

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 9 hours ago

United States

Regarding the point about having more genes with reported DE due to sample size, this is unavoidable for all statistical methods. You can't help that the power is higher when you have more sample size. Removing samples isn't looked upon as a good solution, as you lose information. There isn't a statistical solution to this once you've collected the data and finished the experiment.

Since you don't have access to a local statistician, and the LRT and interaction material in the vignette is not helping clarify the use of these, I think the easiest approach for you is to break into a separate dataset (a separate 'dds') for each day, and just use a design of ~treatment.

Then you can run the following code and build three separate results tables for each day:

dds <- DESeq(dds)
resA <- results(dds, contrast=c("treatment", "A", "Ctrl"))
resB <- results(dds, contrast=c("treatment", "B", "Ctrl"))
resBA <- results(dds, contrast=c("treatment", "B", "A"))

ADD COMMENT • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Thank you Michael. Do I lose out on anything by not running interactions or the LRT? I figured with the LTR I will know if genes change over time, but I will get that answer too if I use Wald test for each time point. The sample size still bothers me though, especially since the effect is so large. :/

ADD REPLY • link 6.4 years ago MrHodor • 0

0

Entering edit mode

It's complicated, but in short there are many roughly similar ways to analyze the data that will allow you to find differences. This approach above is straightforward to interpret if you aren't familiar with interpreting interaction terms. And in the long term, you can try to locate a collaborator who might be able to help with analysis design.

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Hi Michael

A very short question related to the post

Is possible to test this contrast in Deseq2 (A vs Ctrl) vs (B vs Ctrl)

Is better do it in the previous way, instead to the set operation implicit in your answer (A vs Ctrl ) - ( B vs Ctrl )

Thanks in advance

ADD REPLY • link 5.2 years ago aimerdiaz.evo • 0

1

Entering edit mode

This has been asked on the support site a few times before. (A/Ctrl) / (B/Ctrl) = A/B

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

I just find this related question https://www.biostars.org/p/115685/

resultsNames(des) Ctrl=1, B=2,C=3 Is this the way to translate this (A/Ctrl) / (B/Ctrl) = A/B : results(des, lfcThreshold = 2, alpha=0.05, contrast = c(2/1,3/1) ) Thanks in advance

ADD REPLY • link 5.2 years ago aimerdiaz.evo • 0

0

Entering edit mode

No. You should just compare A to B. Please make a full formed separate question if you have one, but there’s not much to it, the control drops out of the fold change based on simple arithmetic.

ADD REPLY • link 5.2 years ago Michael Love 41k