Question

DESEQ ANODEV : A time course study

0

Entering edit mode

Michael Breen ▴ 20

@michael-breen-5981

Last seen 9.6 years ago

Hi all,

We are running into a problem using an ANODEV with DESeq.

Let me state our data, our aim, and our dilemma.

Our data (time course with various times titled A, B and C):

A design matrix would look something like this (this is a dramatic simplification, our real data contrains over 200 participants) :

Design <- data.frame(row.names=colnames(CountTable)
, timecourse = c("A","A","A","A","A","A",
"B","B","B","B","B","B","C","C","C","C","C","C")
, pairs=c("1", "2", "3", "4", "5", "6","1", "2", "3", "4",
"5","6","1", "2",
"3", "4", "5", "6")
, condition=c("Control", "Control", "Control", "Disease", "Disease",
"Disease","Control", "Control", "Control", "Disease", "Disease",
"Disease","Control", "Control", "Control", "Disease", "Disease",
"Disease"))
countTable <- CountTable
cds <- newCountDataSet(countTable, Design)
cds <- estimateSizeFactors(cds)
sizeFactors( cds )
cds <- estimateDispersions(cds, method="pooled-CR", fitType="local")

Our aim:

What we aim to do is to test for DE of transcripts across all 3 time points for disease and controls seperatly (using DESeq ANODEV) but we want to be able to identify at which time points these transcripts are being DE. In other words, we want to compare DE transcripts with respect to specific time points between cases and controls. Our remaining code looks like this:

fit0 <- fitNbinomGLMs (cds, count ~ timecourse)
fit1 <- fitNbinomGLMs ( cds, count ~ timecourse + condition )
str(fit1)
pvals <- nbinomGLMTest( fit1, fit0 )
padj <- p.adjust( pvals, method="fdr" )
rownames(counts(cds))[ padj < .1 ]

Our delimma:

ANODEV is in fact yielding a list of significantly differentially expressed genes. However, it is not informative if we want to know from which time point this differential expression occurred. From our knowledge there is no implementation of post-hoc testing(or equivilent) after ANODEV in DESeq that states at which time point a gene is being called significant from within a timecourse study.

What insight can anyone offer us regarding DESeq towards better understanding at which time points are our DEGs are related to? In short, how can one tell at which time points (or both) is DESeq calling a transcript significantly DE?

Also, does this approach take into consideration the matching paired data?

Yours,
Michael

TimeCourse timecourse DESeq edgeR • 2.7k views

ADD COMMENT • link updated 9.4 years ago by Gordon Smyth 50k • written 10.9 years ago by Michael Breen ▴ 20

score 0 · Answer 1 · 2013-06-10

Hi Michael On 08/06/13 03:00, Michael Breen wrote: > What we aim to do is to test for DE of transcripts across all 3 time > points for disease and controls seperatly (using DESeq ANODEV) but we want > to be able to identify at which > time points these transcripts are being DE. In other words, we want to > compare DE transcripts with > respect to specific time points between cases and controls. Our remaining > code looks like this: > > fit0 <- fitNbinomGLMs (cds, count ~ timecourse) > fit1 <- fitNbinomGLMs ( cds, count ~ timecourse + condition ) > str(fit1) One possibility would be to subset your data to only samples from one time point and then test cases against control to see the genes that are DE at this time point, then go on to the next one. If you consider this a post-hoc test and only look at the genes which show overall sensitivity, you can probably be more lenient on the significance threshold. Maybe other people on the list have input on this point. Simon

Gordon Smyth · Answer 2 · 2013-06-11

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 17 minutes ago

WEHI, Melbourne, Australia

Dear Michael,

What you want to do is easy and fast using the edgeR package, without any need for ad hoc workarounds like subsetting your data. See McCarthy et all (NAR 2012):

http://www.ncbi.nlm.nih.gov/pubmed/22287627

Your experimental design seems to fit exactly into the framework of Section 3.3 of the edgeR User's Guide.

Best wishes

Gordon

ADD COMMENT • link 10.9 years ago • updated 9.4 years ago Gordon Smyth 50k

0

Entering edit mode

Cheers!

I think this is applicable to our needs.

Thank you Gentlemen!

ADD REPLY • link updated 9.4 years ago by Gordon Smyth 50k • written 10.9 years ago by Michael Breen ▴ 20

Gordon Smyth · Answer 3 · 2013-06-17

0

Entering edit mode

Michael Breen ▴ 370

@michael-breen-5999

Last seen 9.6 years ago

Gordon!

On second pass, edgeR doesn't seem to address my question.

We have disease and control data spread over 3 time points (baseline-T0, time after disease causing event-T1, elapsed time after event-T3) for each of these two conditions. I want to be able to find DEGs accordingly to condition across time-points to decipher which times (either at treatment or elapsed time after treatment) our genes are DE.

Any further insight would be fantastic.

Yours,
Michael

ADD COMMENT • link updated 9.4 years ago by Gordon Smyth 50k • written 10.9 years ago by Michael Breen ▴ 370

0

Entering edit mode

Hi Michael,

The simplest and most understandable approach I've found is to define a single factor that splits your data into all the groups. So there would be levels for disease-T0, disease-T1, disease-t3, control-T0, control-T1, and control-T3. Then you can use ~0+group as your model formula and define whatever contrasts you desire between these six groups.

-Ryan Thompson

ADD REPLY • link updated 9.4 years ago by Gordon Smyth 50k • written 10.9 years ago by Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Ryan,

If one were to split it up, what's the difference between these contrasts and a classical class comparison approach? Here you seem to be comparing gene expression differences between two classes while igorning matching data.

Any ideas?
Michael

ADD REPLY • link updated 9.4 years ago by Gordon Smyth 50k • written 10.9 years ago by Michael Breen ▴ 370

0

Entering edit mode

Hi Michael,

Further to Ryan's post, your experimental design seems to fit exactly into the framework of Section 3.3 of the edgeR User's Guide. The approach that Ryan recommends is that of Section 3.3.1. Section 3.3.2 describes another equivalent way to setup the comparisons you want.

Note that you have previously considered additive models like:

   ~ timecourse + condition

but this is not correct because it assumes that the time course effects are exactly the same for both conditions. If you want to test for time effects for each condition separately, then you must use the all-combination models considered in Section 3.3 of the edgeR User's Guide.

Best wishes
Gordon

ADD REPLY • link 10.9 years ago • updated 9.4 years ago Gordon Smyth 50k