Question: Predict circadian expression with edgeR
0
12 months ago by
ri.lars40
ri.lars40 wrote:

I have used edgeR and limoRhyde to predict differential expression in a circadian experiment. Below ZT_cos and ZT_sin is sin and cos transformations of the time variable as calculated by limoRhyde.

The model is of the form model.matrix(~ 0 + Genotype + Genotype:(ZT_cos + ZT_sin)) which results in a model matrix that looks like:

Control  KO  Control:ZT_cos Control:ZT_sin  KO:ZT_cos  KO:ZT_sin
1        0   0.82           0.5             0          0
0        1   0              0               0.82       0.5


etc...

After running the differential expression analysis I would like to calculate the expected expression level at an arbitrary timepoint, I tried generating ZT_cos and ZT_sin for a large number of timepoints and then calculating expected expression in the control condition as:

expression = logFC_Control + logFC_Control:ZT_sin * ZT_sin + logFC_Control:ZT_cos * ZT_cos

The problem is that my model is made without an intercept as this makes differential expression more straightforward (the actual setup is more complicated than indicated above). This means that logFC_KO and logFC_Control have values that are meaningful in relation to each other but not on their own. By playing around with the results I noticed that all that is missing is a constant to get what seems like a meaningful fit - but I cannot figure out what the constant is. I tried the average offset, but that quite far off.

I guess my question boils down to:

Given a model without an intercept, what is the value that the logFC's are calculated relative to?

modified 12 months ago by Aaron Lun25k • written 12 months ago by ri.lars40
2
12 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I assume that you're talking about differential expression between WT and KO, and that the (co)sine functions are as usual for some constant frequency. The interpretation is pretty simple; your log-fold changes are the differences between the two genotypes if there were no time effect. This must be the case because it is impossible for the (co)sine predictor terms to be all zero for any sample in the design matrix, unless the amplitude of the time effect is zero. The presence or absence of an intercept isn't really relevant here.

Yes, that is what I was hoping to model - good to hear that it is correct. My question was more: "how do I use the estimated logFCs to calculate the expected CPMs at a time point I did not observe?"

Today I realised I could calculate the expression as AveLogCPM + (logFC_control - logFC_KO) + time_effect for the expression in the control condition. I believe the first two terms are the constant factor I was looking for above.

A second question: If I have multiple conditions, how do I calculate how an individual condition deviate from the average expression? Is it AveLogCPM + logFC_cond1 - mean(logFC_cond1 + ... + logFC_condN)?

There's no need to use the log-fold changes for what you want to do. You have the coefficients of the fitted GLM, so you can just compute a linear combination of terms for the given genotype. Specifically:

exprs_WT_time = Control + Control:ZT_cos * X + Control:ZT_sin * Y

where Control, Control:ZT_cos and Control:ZT_sin are the estimated values of the coefficients of the same name for a given gene, and X and Y are the (co)sine values for the time point that you want. It's simple to do the same for KO.

This will give you a per-observation value equivalent to (i) dividing the corresponding count by the library size, assuming the offsets are log-library sizes; and (ii) log-transforming. To obtain log-CPMs, simply add log(1e6) to the computed value - you can then divide by log(2) or log(10) or whatever you want the base of the log-transform to be.