Dear all, I would very much appreciate any assistance in the analysis of our RNA-SEQ data, i feel that i'm making a mistake somewhere but cant quite lay my finger on where this is happening.
Study design: 8 wasp allergic patients were stung by a wasp, peripheral blood samples were drawn before the sting (T0) and at 5 set time points afterwards (T5, T11, T15, T30 and T60 minutes). 3 patients developed a systemic reaction (WelReactie), 5 patients dit not develop a systemic reaction (GeenReactie).
Study question: Which genes are deferentially expressed at any time-point between the group "WelReactie" and the group "GeenReactie"
Code
library(DESeq2)
directory = 'C:/Users/Default.COLTON/Documents/R/Longitudinal'
setwd(directory)
sampleFiles <- grep(".htseq", list.files(directory),value=T)
sampleFiles
sampleCondition <- c('GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie')
time <- factor(c('T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60'))
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition, time=time)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition+time+time:condition)
ddsHTSeq$condition<-factor(ddsHTSeq$condition, levels=c("GeenReactie","WelReactie"))
ddsHTSeq$time <- factor(ddsHTSeq$time, levels=c("T0", "T5", "T11", "T15", "T30", "T60"))
dds = DESeq(ddsHTSeq, test="LRT", reduced=~condition + time)
resTC <- results(dds)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),],4)
write.csv(res, file="DATE-Longitudinal.csv")
Results:
| baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | |
| ENSG00000124882 | 11.87947 | 3.418625 | 0.899554 | 25.94876 | 9.13E-05 | 1 |
| ENSG00000198918 | 34.99821 | 2.464948 | 0.583217 | 21.557 | 0.000635 | 1 |
| ENSG00000170632 | 49.10306 | 1.171724 | 0.285648 | 20.24493 | 0.001124 | 1 |
| ENSG00000085274 | 90.14553 | 0.950642 | 0.231124 | 19.91252 | 0.001298 | 1 |
Outcome: To my surprise, all adjusted P values were exactly 1, whereas just a standard crossectional analysis for each time point sepperatly does indicate DEG.
Bioconductor question: are there no DEG or, far more likely, what am i doing wrong?

Dear Micheal,
To clarify what I meant: deseq analysis of all 8 samples taken at baseline indicate 24 DEG (adjp<0.05), between the groups with and without a systemic reaction.
In addition, deseq analysis of all 8 samples taken at T60 minutes indicates 300 DEG (adjp<0.05), between the groups with and without a systemic reaction. To me, this is an indication of some time dependent differential expression of genes.
Attached is the plotCounts() of the top genes based on not adjusted p value, as all adjusted P values are 1.
https://postimg.org/image/d6chg26il/
It looks like there is DE at T60, but the other time points are not so clear, and the within-condition variability is very high, which leads to not-so-small p-values. So it could be the case that you have more power to detect the differences at T60 than when you do the LRT.
Given that there is a lot of within-condition variability here, I'm wondering if you would have improved power by reducing the number of terms you test in the LRT. Usually, I don't recommend to use time as a linear covariate, but it could potentially help here to recover a few genes which differ in the slope of log counts over time.
What if you try:
dds$t <- as.numeric(sub("T(.*)","\\1",as.character(dds$time))) design(dds) <- ~ condition + t + condition:t dds <- DESeq(dds, test="LRT", reduced=~condition + t)