Question

DESeq2 design nested or time course

0

Entering edit mode

katarina.truve • 0

@katarinatruve-17413

Last seen 5.6 years ago

Hi,

We are trying to use DESeq2 to analyse RNAseq data from samples that had either a high or a low response to vaccination. RNA samples are taken at three different time points for each patient. We would like to check for differences in gene expression between groups e.g high and low response, as well as differences between time points both within and between groups.

We figured that we could use a model matrix like the following:

samples <- read.delim("samp.txt")

m1 <- model.matrix(~response+ response:subject.n + response:time, samples)

columns with zeros because of different number of samples per group were removed

and then Deseq2 was run with a full model:

ds = DESeq(ds, full=m1, betaPrior=FALSE)

The sample information look like this (samp.txt):

ID subject time response subject.n

p15_d0_S1 p15 d0 high p1.n

p15_d19_S2 p15 d19 high p1.n

p15_d44_S15 p15 d44 high p1.n

p18_d0_S3 p18 d0 high p2.n

p18_d19_S4 p18 d19 high p2.n

p18_d44_S5 p18 d44 high p2.n

p20_d0_S6 p20 d0 low p1.n

p20_d19_S7 p20 d19 low p1.n

p20_d44_S16 p20 d44 low p1.n

p39_d0_S8 p39 d0 high p3.n

p39_d19_S9 p39 d19 high p3.n

p39_d44_S10 p39 d44 high p3.n

p40_d0_S17 p40 d0 low p2.n

p40_d19_S18 p40 d19 low p2.n

p40_d44_S11 p40 d44 low p2.n

p42_d0_S12 p42 d0 high p4.n

p42_d19_S13 p42 d19 high p4.n

p42_d44_S14 p42 d44 high p4.n

We would like to know if we are performing the analysis as it is supposed to be done. Results look relevant, but we are not sure how to describe how the results were calculated. We assume that this design is controlling for individual effects (within subject correlation), but how can we describe it?

When looking at resultsName(ds) we get for instance “responsehigh.timed44”. Is it correct to say that this shows the difference in gene expression for high response individuals between d0(reference) and d44, controlling for individual differences at d0 using a paired design?, or how would you describe it?

When using contrast e.g:

test<- results(ds, contrast=list("responselow.timed19","responsehigh.timed19"))

So here we look at differences at d19 (compared to d0) for the two groups, but how was individual differences accounted for? Are all differences absorbed for individuals also between groups at d0 or how is it done?

Finally we get the results “responsehigh”. How can we describe correctly what this results show, in relation to time points and individual differences? Are any differences absorbed or does it show differences at baseline d0?

Finally we tested another design described as a time course experiment in your vignette:

ddsTC <- DESeqDataSetFromMatrix(countData=Genecounts, colData=samples, design=~response + time + response:time)

ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ response + time)

resTC <- results(ddsTC)

Are both kinds of design applicable? Would you say one is more appropriate for this kind of data?

Do we understand it correctly that baseline differences at d0 are absorbed with this time course design, but that there is no pairing of individuals?

Thank you in advance! Your help would be greatly appreciated.

Best,

Katarina

deseq2 nested design • 1.2k views

ADD COMMENT • link updated 5.6 years ago by Michael Love 41k • written 5.6 years ago by katarina.truve • 0

score 1 · Answer 1 · 2018-09-20

This first setup is the correct setup. You have controlled for each patient, allowing you to test for time-dependent changes within each response group, or compare time-dependent changes across response group. Because the patients are different in the two response groups, and you want to control for patient baseline, you can't directly compare high vs low at a single time point (if this is confusing, you may want to discuss more with a local statistician why this is not a possible contrast). But you can test if the time-dependent changes differ across response group.

Your description of responsehigh.timed44 is accurate.

The contrast of responselow.timed19 and responsehigh.timed19 is testing whether the d19 over d0 effect is different across response group, controlling for individual baseline differences.

The main effects "responsehigh" are not interpretable here because of the way we are controlling for patient. Again, that's a limitation because you want to control for patient baseline, and in DESeq2 we fit fixed effects.

The second setup doesn't control for patient ("no pairing" as you say) and I would prefer the first.