[DESeq2] time-course Experimental design
2
0
Entering edit mode
inzirio ▴ 10
@inzirio-13571
Last seen 5 months ago
Italy

I'd like to perform an experimental design for DESeq2 package using the LRT Test.

I've already done it before using two different conditions Treated (T) vs Untreated (U), each at different time points.

I was wondering how to design an experiment having the U replicates just at time point 0 and T replicated samples at different time-points.

Is it possible to perform it with DESeq2? I'm not able to find examples as in this special case, but i'm able to find several time-course data designed in this way.

Thanks,
Inzirio

rnaseq deseq2 time-course • 1.5k views
1
Entering edit mode
Gavin Kelly ▴ 640
@gavin-kelly-6944
Last seen 2.3 years ago
United Kingdom / London / Francis Crick…

Sounds like you'd just need a single factor to model the experiment, where if U corresponds to 0hours, and e.g. T has 2hours, 4hours 16hours, then you'd simply label the replicates with a single factor, with levels  0h, 2h, 4h, 16h etc.   You obviously won't be able to infer anything about how the untreated samples evolve over time, so you won't be able to remove such effects from the analysis, so all your conclusions will have to be worded with this in mind.  You say you've previously done the other, more complete design, which to me seems to offer a safer approach to meaningful biological hypotheses, so I guess it's cost issues that are driving the question?

The only option for an LRT with one factor is to compare ~time against ~1, which will look for genes that reject the null of being constant across all 'timepoints' (ie are the same across all the treated samples, and the same as the untreated).

DESeq2 is more than capable of answering comparisons between pairs of timepoints, including the 0h (untreated) vs 2h (treated), for example - for this you'd need the Wald test, rather than the LRT.  Just remember to code the timepoint term as a factor, as if you code it as a numeric, the comparisons will look for linear changes of (transformed) expression across time, which mightn't be what you want.

0
Entering edit mode

"I guess it's cost issues that are driving the question?"

I'm trying to understand and capture different approaches to model the time course data analysis.
So, for the (more complex) design, using DESeq2, didn't give me any problem, also because it's the best way to model a time-course experiment, but in the other case (the cheapest one) my first attempt gave me problems while running DESeq2, that's why I was asking about this special case. (and also because I noticed that this case is very common in published time-course data over internet).

For the second part:
"DESeq2 is more than capable of answering comparisons between pairs of timepoints, including the 0h (untreated) vs 2h (treated), for example - for this you'd need the Wald test, rather than the LRT."

Yeah, looking at the LRT "complete" results it's possible to capture all the comparisons made between time-points versus the 0h, and also to retest the LRT observed results, running a Wald test on them. But they will be always the differences obtained starting from time-point 0. It'd be a different meaning of the biological question respect to the one observed using the LRT test.

"Just remember to code the timepoint term as a factor, as if you code it as a numeric, the comparisons will look for linear changes of (transformed) expression across time, which mightn't be what you want."
Could you please better explain this part?

Thanks,
Inzirio

0
Entering edit mode
Gavin Kelly ▴ 640
@gavin-kelly-6944
Last seen 2.3 years ago
United Kingdom / London / Francis Crick…

Still a bit confused about your use of LRT in the 'cheaper' design - it does not look at 'all... timepoints versus the 0h' - it's looking to see if the null of all timepoints being the same holds.  So an LRT could feasibly come out as significant if the end timepoint was different from all previous timepoints, in which case one would probably interpret that finding not in terms of the untreated (0h) timepoint: there's nothing special about the 0hr in LRT.  Similarly, there's not necessarily anything special about 0hr in Wald, as you can test 2hr vs 4hr just as easily as Untreated (0hr) vs 2hr ...  As you say, diffrerent meanings of biological question, but neither LRT nor Wald will treat the untreated differently from other timepoints; nor will they 'know' that 4hr lies in between 2hr and 8hr, for instance.

My final point is probably something you're doing anyway, but I tend to put it in as warning to readers of answers that if, say, you encode timepoint <- c(0,0,0,3,3,3,6,6,6) and then put in a ~timepoint in your design, then you'll only get one coefficient out, the significance of which is indicative of a linear trend in expression of time (so in this case, there is some 'knowledge' that 4 is in between 2 and 8).  This alternative hypothesis is different from the one that most people expect, which is the pairwise difference between timepoints (achieved by timepoint <- factor(timepoint))