Question: edgeR: time series analysis
gravatar for BharathAnanth
15 months ago by
BharathAnanth10 wrote:


I have RNA-seq time course data consisting of 11 individual time points. I however do not have replicates for each time point. I am trying to fit a simple linear model of the form to detect oscillations:

time <- seq(2,22,by=2)

in.phase <- cos(2*pi/22*time)
out.phase <- sin(2*pi/22*time)

design <- model.matrix(~in.phase + out.phase)

My question is can my large residual degrees of freedom compensate for my lack of biological replicates at each time point. In other words, can I use the standard pipeline with estimateDisp(y, design, robust=TRUE) to process my data or do I need to (a) choose a reasonable BCV value (as suggested in the manual) (b) only estimate trended dispersion?

Following the standard pipeline, I was wondering if the oscillating genes (are obviously also the ones with lot of sample to sample variability in my case) get assigned larger than "reasonable" tag wise dispersion? I do not have problems with identifying them with the standard pipeline, but I am trying to understand what assumptions I am making.

Thank you.

ADD COMMENTlink modified 15 months ago by Gordon Smyth33k • written 15 months ago by BharathAnanth10
gravatar for Aaron Lun
15 months ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

As long as your model has non-zero residual d.f., you can estimate a dispersion for each gene. With time series, the general assumption is that expression follows some smooth trend with respect to time - deviations from that trend can be used for dispersion estimation. Obviously, the more residual d.f. you have, the more precise your dispersion estimates are, and the more reliable your downstream analyses will be. This is easiest to achieve with more replicates, as it avoids the need to make strong assumptions about your response to time.

In your case, you've applied the cosine and sine functions under the assumption that one cycle takes exactly 22 time units. I can't remember all my trigonometric identities, but I don't think that linear sums of these functions can be used to represent situations where cycles are faster or slower. If a gene had a different cycling time, its expression profile with respect to time would not be modelled well, resulting in an inflated dispersion.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Aaron Lun19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 326 users visited in the last hour