Question

DESeq2 Model Design - The right thing to do

0

Entering edit mode

andrew.j.skelton73 ▴ 370

@andrewjskelton73-7074

Last seen 5 weeks ago

United Kingdom

Hi,

I've got some data that comes under an unusual design and I was looking for some advice. The experiment consists of 3 time points, 3 conditions, in triplicate, and it's paired by treatment.

Design Matrix

> design
   Pairing Treatment TimePoint
1        A    type1       2hr
2        A    type2       2hr
3        A    type3       2hr
4        B    type1       6hr
5        B    type2       6hr
6        B    type3       6hr
7        G    type1      24hr
8        G    type2      24hr
9        G    type3      24hr
10       C    type1       2hr
11       C    type2       2hr
12       C    type3       2hr
13       D    type1       6hr
14       D    type2       6hr
15       D    type3       6hr
16       H    type1      24hr
17       H    type2      24hr
18       H    type3      24hr
19       E    type1       2hr
20       E    type2       2hr
21       E    type3       2hr
22       F    type1       6hr
23       F    type2       6hr
24       F    type3       6hr
25       I    type1      24hr
26       I    type2      24hr
27       I    type3      24hr

Model Design

~Pairing + Treatment * TimePoint

Returns an error that the model is not full rank, understandably. The only comparisons I'm interested in are between the treatments at each time point (relative to the pairings). So I can simply filter the samples I put into DESeq2 by time point, then use the model:

~Pairing + Treatment

Which allows me to do the between treatment contrasts. I was looking for ways to include all samples (as DESeq2 will utilise all samples for dispersion estimation, I believe). I came across section 3.12.1 of the DESeq2 users guide, which states that I can do "Comparisons Both Between and Within Subjects" by adding a design column that distinguishes nested pairings.

So adding to the design matrix:

> design
   Pairing Treatment TimePoint Pairing.rf
1        A    type1       2hr          1
2        A    type2       2hr          1
3        A    type3       2hr          1
4        B    type1       6hr          1
5        B    type2       6hr          1
6        B    type3       6hr          1
7        G    type1      24hr          1
8        G    type2      24hr          1
9        G    type3      24hr          1
10       C    type1       2hr          2
11       C    type2       2hr          2
12       C    type3       2hr          2
13       D    type1       6hr          2
14       D    type2       6hr          2
15       D    type3       6hr          2
16       H    type1      24hr          2
17       H    type2      24hr          2
18       H    type3      24hr          2
19       E    type1       2hr          3
20       E    type2       2hr          3
21       E    type3       2hr          3
22       F    type1       6hr          3
23       F    type2       6hr          3
24       F    type3       6hr          3
25       I    type1      24hr          3
26       I    type2      24hr          3
27       I    type3      24hr          3

...and the model design:

~TimePoint + TimePoint:Pairing.rf + TimePoint:Treatment

...allows me to compare between groups, for example:

results(dds2, contrast=list(c("TimePoint24hr.Treatmenttype3"),
                            c("TimePoint24hr.Treatmenttype1")))

...But I'm don't think that this approach truly utilises the pairings. Ultimately, I'm asking what the best approach to take is, and if there's a benefit to using the second approach over the first?

Thanks for any insight!

deseq2 • 1.6k views

ADD COMMENT • link updated 8.7 years ago by Michael Love 41k • written 8.7 years ago by andrew.j.skelton73 ▴ 370

score 2 · Accepted Answer · 2015-08-03

hi Andrew,

I find it a bit easier to look at it like this:

   TimePoint Treatment Pairing
1        2hr     type1       A
2        2hr     type1       C
3        2hr     type1       E
4        2hr     type2       A
5        2hr     type2       C
6        2hr     type2       E
7        2hr     type3       A
8        2hr     type3       C
9        2hr     type3       E
10       6hr     type1       B
11       6hr     type1       D
12       6hr     type1       F
13       6hr     type2       B
14       6hr     type2       D
15       6hr     type2       F
16       6hr     type3       B
17       6hr     type3       D
18       6hr     type3       F
19      24hr     type1       G
20      24hr     type1       H
21      24hr     type1       I
22      24hr     type2       G
23      24hr     type2       H
24      24hr     type2       I
25      24hr     type3       G
26      24hr     type3       H
27      24hr     type3       I

You said: "The only comparisons I'm interested in are between the treatments at each time point (relative to the pairings)."

So if you just want to compare treatments at each time point, I would add a new grouping factor:

dds$group <- factor(paste(dds$TimePoint, dds$Treatment))

Then add a factor column, pairing.nested, which gives us the pairing information within time, which in my representation above would go: 1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,....

use a design:

~ TimePoint:pairing.nested + group

And use, e.g.:

results(dds, contrast=c("group", "2hrType2", "2hrType1"))