Deseq2 time series question,
1
1
Entering edit mode
@cannedcanary-11827
Last seen 4.6 years ago

Dear all, I would very much appreciate any assistance in the analysis of our RNA-SEQ data, i feel that i'm making a mistake somewhere but cant quite lay my finger on where this is happening. 

Study design: 8 wasp allergic patients were stung by a wasp, peripheral blood samples were drawn before the sting (T0) and at 5 set time points afterwards (T5, T11, T15, T30 and T60 minutes). 3 patients developed a systemic reaction (WelReactie), 5 patients dit not develop a systemic reaction (GeenReactie).  

Study question:  Which genes are deferentially expressed at any time-point between the group "WelReactie" and the group "GeenReactie"

Code

library(DESeq2)
directory = 'C:/Users/Default.COLTON/Documents/R/Longitudinal'
setwd(directory)
sampleFiles <- grep(".htseq", list.files(directory),value=T)
sampleFiles
sampleCondition <- c('GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie')
time <- factor(c('T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60'))
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition, time=time)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition+time+time:condition)
ddsHTSeq$condition<-factor(ddsHTSeq$condition, levels=c("GeenReactie","WelReactie"))
ddsHTSeq$time <- factor(ddsHTSeq$time, levels=c("T0", "T5", "T11", "T15", "T30", "T60"))
dds = DESeq(ddsHTSeq, test="LRT", reduced=~condition + time)
resTC <- results(dds)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),],4)
write.csv(res, file="DATE-Longitudinal.csv")

Results: 

  baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000124882 11.87947 3.418625 0.899554 25.94876 9.13E-05 1
ENSG00000198918 34.99821 2.464948 0.583217 21.557 0.000635 1
ENSG00000170632 49.10306 1.171724 0.285648 20.24493 0.001124 1
ENSG00000085274 90.14553 0.950642 0.231124 19.91252 0.001298 1

Outcome:  To my surprise, all adjusted P values were exactly 1, whereas just a standard crossectional analysis for each time point sepperatly does indicate DEG.

Bioconductor question: are there no DEG or, far more likely, what am i doing wrong?

 

 

rnaseq deseq2 timecourse • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

"whereas just a standard cross-sectional analysis for each time point separately does indicate DEG"

Do you mean for the top gene?

One detail is: the test you have performed is for differences after T0. It will not find a gene significant if there are differences at each time point which are equal in magnitude to the difference at T0.

Can you post a plotCounts() of the top gene? See the time series section in the workflow.

 

ADD COMMENT
0
Entering edit mode

Dear Micheal, 

To clarify what I meant: deseq analysis of all 8 samples taken at baseline indicate 24 DEG (adjp<0.05), between the groups with and without a systemic reaction.

In addition, deseq analysis of all 8 samples taken at T60 minutes indicates 300 DEG (adjp<0.05), between the groups with and without a systemic reaction. To me, this is an indication of some time dependent differential expression of genes.

Attached is the plotCounts() of the top genes based on not adjusted p value, as all adjusted P values are 1.

   

https://postimg.org/image/d6chg26il/

ADD REPLY
0
Entering edit mode

It looks like there is DE at T60, but the other time points are not so clear, and the within-condition variability is very high, which leads to not-so-small p-values. So it could be the case that you have more power to detect the differences at T60 than when you do the LRT. 

Given that there is a lot of within-condition variability here, I'm wondering if you would have improved power by reducing the number of terms you test in the LRT. Usually, I don't recommend to use time as a linear covariate, but it could potentially help here to recover a few genes which differ in the slope of log counts over time.

What if you try:

dds$t <- as.numeric(sub("T(.*)","\\1",as.character(dds$time)))
design(dds) <- ~ condition + t + condition:t
dds <- DESeq(dds, test="LRT", reduced=~condition + t)
ADD REPLY

Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6