We are trying to use DESeq2 to analyse RNAseq data from samples that had either a high or a low response to vaccination. RNA samples are taken at three different time points for each patient. We would like to check for differences in gene expression between groups e.g high and low response, as well as differences between time points both within and between groups.
We figured that we could use a model matrix like the following:
samples <- read.delim("samp.txt")
m1 <- model.matrix(~response+ response:subject.n + response:time, samples)
columns with zeros because of different number of samples per group were removed
and then Deseq2 was run with a full model:
ds = DESeq(ds, full=m1, betaPrior=FALSE)
The sample information look like this (samp.txt):
ID subject time response subject.n
p15_d0_S1 p15 d0 high p1.n
p15_d19_S2 p15 d19 high p1.n
p15_d44_S15 p15 d44 high p1.n
p18_d0_S3 p18 d0 high p2.n
p18_d19_S4 p18 d19 high p2.n
p18_d44_S5 p18 d44 high p2.n
p20_d0_S6 p20 d0 low p1.n
p20_d19_S7 p20 d19 low p1.n
p20_d44_S16 p20 d44 low p1.n
p39_d0_S8 p39 d0 high p3.n
p39_d19_S9 p39 d19 high p3.n
p39_d44_S10 p39 d44 high p3.n
p40_d0_S17 p40 d0 low p2.n
p40_d19_S18 p40 d19 low p2.n
p40_d44_S11 p40 d44 low p2.n
p42_d0_S12 p42 d0 high p4.n
p42_d19_S13 p42 d19 high p4.n
p42_d44_S14 p42 d44 high p4.n
We would like to know if we are performing the analysis as it is supposed to be done. Results look relevant, but we are not sure how to describe how the results were calculated. We assume that this design is controlling for individual effects (within subject correlation), but how can we describe it?
When looking at resultsName(ds) we get for instance “responsehigh.timed44”. Is it correct to say that this shows the difference in gene expression for high response individuals between d0(reference) and d44, controlling for individual differences at d0 using a paired design?, or how would you describe it?
When using contrast e.g:
test<- results(ds, contrast=list("responselow.timed19","responsehigh.timed19"))
So here we look at differences at d19 (compared to d0) for the two groups, but how was individual differences accounted for? Are all differences absorbed for individuals also between groups at d0 or how is it done?
Finally we get the results “responsehigh”. How can we describe correctly what this results show, in relation to time points and individual differences? Are any differences absorbed or does it show differences at baseline d0?
Finally we tested another design described as a time course experiment in your vignette:
ddsTC <- DESeqDataSetFromMatrix(countData=Genecounts, colData=samples, design=~response + time + response:time)
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ response + time)
resTC <- results(ddsTC)
Are both kinds of design applicable? Would you say one is more appropriate for this kind of data?
Do we understand it correctly that baseline differences at d0 are absorbed with this time course design, but that there is no pairing of individuals?
Thank you in advance! Your help would be greatly appreciated.