DESeq2 design nested or time course
1
0
Entering edit mode
@katarinatruve-17413
Last seen 6.3 years ago

Hi,

We are trying to use DESeq2 to analyse RNAseq data from samples that had either a high or a low response to vaccination.  RNA samples are taken at three different time points for each patient. We would like to check for differences in gene expression between groups e.g high and low response, as well as differences between time points both within and between groups.

We figured that we could use a model matrix like the following:

samples <- read.delim("samp.txt")

 m1 <- model.matrix(~response+ response:subject.n + response:time, samples)

 columns with zeros because of different number of samples per group were removed

and then Deseq2 was run with a full model:

 ds = DESeq(ds, full=m1, betaPrior=FALSE)

The sample information look like this (samp.txt):

ID                            subject        time   response      subject.n

p15_d0_S1              p15           d0             high          p1.n

p15_d19_S2           p15           d19           high          p1.n

p15_d44_S15         p15           d44           high          p1.n

p18_d0_S3              p18           d0             high          p2.n

p18_d19_S4           p18           d19           high          p2.n

p18_d44_S5           p18           d44           high          p2.n

p20_d0_S6              p20           d0             low           p1.n

p20_d19_S7           p20           d19           low           p1.n

p20_d44_S16         p20           d44           low           p1.n

p39_d0_S8              p39           d0             high          p3.n

p39_d19_S9           p39           d19           high          p3.n

p39_d44_S10         p39           d44           high          p3.n

p40_d0_S17           p40           d0             low           p2.n

p40_d19_S18         p40           d19           low           p2.n

p40_d44_S11         p40           d44           low           p2.n

p42_d0_S12           p42           d0             high          p4.n

p42_d19_S13         p42           d19           high          p4.n

p42_d44_S14         p42           d44           high          p4.n

We would like to know if we are performing the analysis as it is supposed to be done.  Results look relevant, but we are not sure how to describe how the results were calculated. We assume that this design is controlling for individual effects (within subject correlation), but how can we describe it?

When looking at resultsName(ds) we get for instance “responsehigh.timed44”. Is it correct to say that this shows the difference in gene expression for high response individuals between d0(reference) and d44, controlling for individual differences at d0 using a paired design?, or how would you describe it?

When using contrast e.g:

test<- results(ds, contrast=list("responselow.timed19","responsehigh.timed19"))

So here we look at differences at d19 (compared to d0) for the two groups, but how was individual differences accounted for? Are all differences absorbed for individuals also between groups at d0 or how is it done?

Finally we get the results “responsehigh”. How can we describe correctly what this results show, in relation to time points and individual differences? Are any differences absorbed or does it show differences at baseline d0?

Finally we tested another design described as a time course experiment in your vignette:

ddsTC <- DESeqDataSetFromMatrix(countData=Genecounts, colData=samples, design=~response + time + response:time)

ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ response + time)

resTC <- results(ddsTC)

Are both kinds of design applicable? Would you say one is more appropriate for this kind of data?

Do we understand it correctly that baseline differences at d0 are absorbed with this time course design, but that there is no pairing of individuals?

Thank you in advance! Your help would be greatly appreciated.

Best,

Katarina

deseq2 nested design • 1.4k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

This first setup is the correct setup. You have controlled for each patient, allowing you to test for time-dependent changes within each response group, or compare time-dependent changes across response group. Because the patients are different in the two response groups, and you want to control for patient baseline, you can't directly compare high vs low at a single time point (if this is confusing, you may want to discuss more with a local statistician why this is not a possible contrast). But you can test if the time-dependent changes differ across response group.

Your description of responsehigh.timed44 is accurate.

The contrast of responselow.timed19 and responsehigh.timed19 is testing whether the d19 over d0 effect is different across response group, controlling for individual baseline differences.

The main effects "responsehigh" are not interpretable here because of the way we are controlling for patient. Again, that's a limitation because you want to control for patient baseline, and in DESeq2 we fit fixed effects.

The second setup doesn't control for patient ("no pairing" as you say) and I would prefer the first.

ADD COMMENT

Login before adding your answer.

Traffic: 486 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6