Question

Deseq2 time series question,

1

Entering edit mode

cannedcanary • 0

@cannedcanary-11827

Last seen 7.4 years ago

Dear all, I would very much appreciate any assistance in the analysis of our RNA-SEQ data, i feel that i'm making a mistake somewhere but cant quite lay my finger on where this is happening.

Study design: 8 wasp allergic patients were stung by a wasp, peripheral blood samples were drawn before the sting (T0) and at 5 set time points afterwards (T5, T11, T15, T30 and T60 minutes). 3 patients developed a systemic reaction (WelReactie), 5 patients dit not develop a systemic reaction (GeenReactie).

Study question: Which genes are deferentially expressed at any time-point between the group "WelReactie" and the group "GeenReactie"

Code

library(DESeq2)
directory = 'C:/Users/Default.COLTON/Documents/R/Longitudinal'
setwd(directory)
sampleFiles <- grep(".htseq", list.files(directory),value=T)
sampleFiles
sampleCondition <- c('GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie','GeenReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie',
'WelReactie','WelReactie','WelReactie','WelReactie','WelReactie','WelReactie')
time <- factor(c('T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60',
'T0', 'T11', 'T15', 'T30', 'T5', 'T60'))
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition, time=time)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition+time+time:condition)
ddsHTSeq$condition<-factor(ddsHTSeq$condition, levels=c("GeenReactie","WelReactie"))
ddsHTSeq$time <- factor(ddsHTSeq$time, levels=c("T0", "T5", "T11", "T15", "T30", "T60"))
dds = DESeq(ddsHTSeq, test="LRT", reduced=~condition + time)
resTC <- results(dds)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),],4)
write.csv(res, file="DATE-Longitudinal.csv")

Results:

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
ENSG00000124882	11.87947	3.418625	0.899554	25.94876	9.13E-05	1
ENSG00000198918	34.99821	2.464948	0.583217	21.557	0.000635	1
ENSG00000170632	49.10306	1.171724	0.285648	20.24493	0.001124	1
ENSG00000085274	90.14553	0.950642	0.231124	19.91252	0.001298	1

Outcome: To my surprise, all adjusted P values were exactly 1, whereas just a standard crossectional analysis for each time point sepperatly does indicate DEG.

Bioconductor question: are there no DEG or, far more likely, what am i doing wrong?

rnaseq deseq2 timecourse • 1.7k views

ADD COMMENT • link updated 7.4 years ago by Michael Love 41k • written 7.4 years ago by cannedcanary • 0

score 1 · Answer 1 · 2016-11-11

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 56 minutes ago

United States

"whereas just a standard cross-sectional analysis for each time point separately does indicate DEG"

Do you mean for the top gene?

One detail is: the test you have performed is for differences after T0. It will not find a gene significant if there are differences at each time point which are equal in magnitude to the difference at T0.

Can you post a plotCounts() of the top gene? See the time series section in the workflow.

ADD COMMENT • link 7.4 years ago Michael Love 41k

0

Entering edit mode

Dear Micheal,

To clarify what I meant: deseq analysis of all 8 samples taken at baseline indicate 24 DEG (adjp<0.05), between the groups with and without a systemic reaction.

In addition, deseq analysis of all 8 samples taken at T60 minutes indicates 300 DEG (adjp<0.05), between the groups with and without a systemic reaction. To me, this is an indication of some time dependent differential expression of genes.

Attached is the plotCounts() of the top genes based on not adjusted p value, as all adjusted P values are 1.

https://postimg.org/image/d6chg26il/

ADD REPLY • link 7.4 years ago cannedcanary • 0

0

Entering edit mode

It looks like there is DE at T60, but the other time points are not so clear, and the within-condition variability is very high, which leads to not-so-small p-values. So it could be the case that you have more power to detect the differences at T60 than when you do the LRT.

Given that there is a lot of within-condition variability here, I'm wondering if you would have improved power by reducing the number of terms you test in the LRT. Usually, I don't recommend to use time as a linear covariate, but it could potentially help here to recover a few genes which differ in the slope of log counts over time.

What if you try:

dds$t <- as.numeric(sub("T(.*)","\\1",as.character(dds$time)))
design(dds) <- ~ condition + t + condition:t
dds <- DESeq(dds, test="LRT", reduced=~condition + t)

ADD REPLY • link 7.4 years ago Michael Love 41k