Question: Predict circadian expression with edgeR
0
gravatar for ri.lars
12 months ago by
ri.lars40
ri.lars40 wrote:

I have used edgeR and limoRhyde to predict differential expression in a circadian experiment. Below ZT_cos and ZT_sin is sin and cos transformations of the time variable as calculated by limoRhyde. 

The model is of the form model.matrix(~ 0 + Genotype + Genotype:(ZT_cos + ZT_sin)) which results in a model matrix that looks like:

Control  KO  Control:ZT_cos Control:ZT_sin  KO:ZT_cos  KO:ZT_sin
1        0   0.82           0.5             0          0
0        1   0              0               0.82       0.5

etc...

After running the differential expression analysis I would like to calculate the expected expression level at an arbitrary timepoint, I tried generating ZT_cos and ZT_sin for a large number of timepoints and then calculating expected expression in the control condition as:

expression = logFC_Control + logFC_Control:ZT_sin * ZT_sin + logFC_Control:ZT_cos * ZT_cos

The problem is that my model is made without an intercept as this makes differential expression more straightforward (the actual setup is more complicated than indicated above). This means that logFC_KO and logFC_Control have values that are meaningful in relation to each other but not on their own. By playing around with the results I noticed that all that is missing is a constant to get what seems like a meaningful fit - but I cannot figure out what the constant is. I tried the average offset, but that quite far off.

I guess my question boils down to:

Given a model without an intercept, what is the value that the logFC's are calculated relative to?

ADD COMMENTlink modified 12 months ago by Aaron Lun25k • written 12 months ago by ri.lars40
Answer: Predict circadian expression with edgeR
2
gravatar for Aaron Lun
12 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I assume that you're talking about differential expression between WT and KO, and that the (co)sine functions are as usual for some constant frequency. The interpretation is pretty simple; your log-fold changes are the differences between the two genotypes if there were no time effect. This must be the case because it is impossible for the (co)sine predictor terms to be all zero for any sample in the design matrix, unless the amplitude of the time effect is zero. The presence or absence of an intercept isn't really relevant here.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Aaron Lun25k

Yes, that is what I was hoping to model - good to hear that it is correct. My question was more: "how do I use the estimated logFCs to calculate the expected CPMs at a time point I did not observe?"

Today I realised I could calculate the expression as AveLogCPM + (logFC_control - logFC_KO) + time_effect for the expression in the control condition. I believe the first two terms are the constant factor I was looking for above.

A second question: If I have multiple conditions, how do I calculate how an individual condition deviate from the average expression? Is it AveLogCPM + logFC_cond1 - mean(logFC_cond1 + ... + logFC_condN)?

ADD REPLYlink modified 12 months ago • written 12 months ago by ri.lars40

There's no need to use the log-fold changes for what you want to do. You have the coefficients of the fitted GLM, so you can just compute a linear combination of terms for the given genotype. Specifically:

exprs_WT_time = Control + Control:ZT_cos * X + Control:ZT_sin * Y

where Control, Control:ZT_cos and Control:ZT_sin are the estimated values of the coefficients of the same name for a given gene, and X and Y are the (co)sine values for the time point that you want. It's simple to do the same for KO.

This will give you a per-observation value equivalent to (i) dividing the corresponding count by the library size, assuming the offsets are log-library sizes; and (ii) log-transforming. To obtain log-CPMs, simply add log(1e6) to the computed value - you can then divide by log(2) or log(10) or whatever you want the base of the log-transform to be.

ADD REPLYlink modified 12 months ago • written 12 months ago by Aaron Lun25k

Great! That was exactly what I wanted. I tried using the model coefficients at first but I did not realise I needed to scale them so they were all off.

ADD REPLYlink written 12 months ago by ri.lars40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 415 users visited in the last hour