Question: edgeR: time series analysis
gravatar for BharathAnanth
2.7 years ago by
BharathAnanth70 wrote:


I have RNA-seq time course data consisting of 11 individual time points. I however do not have replicates for each time point. I am trying to fit a simple linear model of the form to detect oscillations:

time <- seq(2,22,by=2)

in.phase <- cos(2*pi/22*time)
out.phase <- sin(2*pi/22*time)

design <- model.matrix(~in.phase + out.phase)

My question is can my large residual degrees of freedom compensate for my lack of biological replicates at each time point. In other words, can I use the standard pipeline with estimateDisp(y, design, robust=TRUE) to process my data or do I need to (a) choose a reasonable BCV value (as suggested in the manual) (b) only estimate trended dispersion?

Following the standard pipeline, I was wondering if the oscillating genes (are obviously also the ones with lot of sample to sample variability in my case) get assigned larger than "reasonable" tag wise dispersion? I do not have problems with identifying them with the standard pipeline, but I am trying to understand what assumptions I am making.

Thank you.

edger time course • 1.2k views
ADD COMMENTlink modified 2.7 years ago by Gordon Smyth39k • written 2.7 years ago by BharathAnanth70
Answer: edgeR: time series analysis
gravatar for Aaron Lun
2.7 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

As long as your model has non-zero residual d.f., you can estimate a dispersion for each gene. With time series, the general assumption is that expression follows some smooth trend with respect to time - deviations from that trend can be used for dispersion estimation. Obviously, the more residual d.f. you have, the more precise your dispersion estimates are, and the more reliable your downstream analyses will be. This is easiest to achieve with more replicates, as it avoids the need to make strong assumptions about your response to time.

In your case, you've applied the cosine and sine functions under the assumption that one cycle takes exactly 22 time units. I can't remember all my trigonometric identities, but I don't think that linear sums of these functions can be used to represent situations where cycles are faster or slower. If a gene had a different cycling time, its expression profile with respect to time would not be modelled well, resulting in an inflated dispersion.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Aaron Lun25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 215 users visited in the last hour