DESEQ ANODEV : A time course study
3
0
Entering edit mode
@michael-breen-5981
Last seen 7.1 years ago

Hi all,

We are running into a problem using an ANODEV with DESeq.

Let me state our data, our aim, and our dilemma.

Our data (time course with various times titled A, B and C):

A design matrix would look something like this (this is a dramatic simplification, our real data contrains over 200 participants) :

Design <- data.frame(row.names=colnames(CountTable)
, timecourse = c("A","A","A","A","A","A",
"B","B","B","B","B","B","C","C","C","C","C","C")
, pairs=c("1", "2", "3", "4", "5", "6","1", "2", "3", "4",
"5","6","1", "2",
"3", "4", "5", "6")
, condition=c("Control", "Control", "Control", "Disease", "Disease",
"Disease","Control", "Control", "Control", "Disease", "Disease",
"Disease","Control", "Control", "Control", "Disease", "Disease",
"Disease"))
countTable <- CountTable
cds <- newCountDataSet(countTable, Design)
cds <- estimateSizeFactors(cds)
sizeFactors( cds )
cds <- estimateDispersions(cds, method="pooled-CR", fitType="local")

Our aim:

What we aim to do is to test for DE of transcripts across all 3 time points for disease and controls seperatly (using DESeq ANODEV) but we want to be able to identify at which time points these transcripts are being DE. In other words, we want to compare DE transcripts with respect to specific time points between cases and controls. Our remaining code looks like this:

fit0 <- fitNbinomGLMs (cds, count ~ timecourse)
fit1 <- fitNbinomGLMs ( cds, count ~ timecourse + condition )
str(fit1)
pvals <- nbinomGLMTest( fit1, fit0 )
rownames(counts(cds))[ padj < .1 ]

Our delimma:

ANODEV is in fact yielding a list of significantly differentially expressed genes. However, it is not informative if we want to know from which time point this differential expression occurred. From our knowledge there is no implementation of post-hoc testing(or equivilent) after ANODEV in DESeq that states at which time point a gene is being called significant from within a timecourse study.

What insight can anyone offer us regarding DESeq towards better understanding at which time points are our DEGs are related to? In short, how can one tell at which time points (or both) is DESeq calling a transcript significantly DE?

Also, does this approach take into consideration the matching paired data?

Yours,
Michael

TimeCourse timecourse DESeq edgeR • 1.8k views
0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 14 months ago
Zentrum für Molekularbiologie, Universi…
Hi Michael On 08/06/13 03:00, Michael Breen wrote: > What we aim to do is to test for DE of transcripts across all 3 time > points for disease and controls seperatly (using DESeq ANODEV) but we want > to be able to identify at which > time points these transcripts are being DE. In other words, we want to > compare DE transcripts with > respect to specific time points between cases and controls. Our remaining > code looks like this: > > fit0 <- fitNbinomGLMs (cds, count ~ timecourse) > fit1 <- fitNbinomGLMs ( cds, count ~ timecourse + condition ) > str(fit1) One possibility would be to subset your data to only samples from one time point and then test cases against control to see the genes that are DE at this time point, then go on to the next one. If you consider this a post-hoc test and only look at the genes which show overall sensitivity, you can probably be more lenient on the significance threshold. Maybe other people on the list have input on this point. Simon
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

Dear Michael,

What you want to do is easy and fast using the edgeR package, without any need for ad hoc workarounds like subsetting your data. See McCarthy et all (NAR 2012):

Your experimental design seems to fit exactly into the framework of Section 3.3 of the edgeR User's Guide.

Best wishes

Gordon

0
Entering edit mode

Cheers!

I think this is applicable to our needs.

Thank you Gentlemen!

0
Entering edit mode
Michael Breen ▴ 370
@michael-breen-5999
Last seen 7.1 years ago

Gordon!

On second pass, edgeR doesn't seem to address my question.

We have disease and control data spread over 3 time points (baseline-T0, time after disease causing event-T1, elapsed time after event-T3) for each of these two conditions. I want to be able to find DEGs accordingly to condition across time-points to decipher which times (either at treatment or elapsed time after treatment) our genes are DE.

Any further insight would be fantastic.

Yours,
Michael

0
Entering edit mode

Hi Michael,

The simplest and most understandable approach I've found is to define a single factor that splits your data into all the groups. So there would be levels for disease-T0, disease-T1, disease-t3, control-T0, control-T1, and control-T3. Then you can use ~0+group as your model formula and define whatever contrasts you desire between these six groups.

-Ryan Thompson

0
Entering edit mode

Ryan,

If one were to split it up, what's the difference between these contrasts and a classical class comparison approach? Here you seem to be comparing gene expression differences between two classes while igorning matching data.

Any ideas?
Michael

0
Entering edit mode

Hi Michael,

Further to Ryan's post, your experimental design seems to fit exactly into the framework of Section 3.3 of the edgeR User's Guide.  The approach that Ryan recommends is that of Section 3.3.1.  Section 3.3.2 describes another equivalent way to setup the comparisons you want.

Note that you have previously considered additive models like:

   ~ timecourse + condition

but this is not correct because it assumes that the time course effects are exactly the same for both conditions.  If you want to test for time effects for each condition separately, then you must use the all-combination models considered in Section 3.3 of the edgeR User's Guide.

Best wishes
Gordon