Question: DESeq2 design formula - correlation with ordered variable
0
t.kuilman140 wrote:

I would like to analyse my RNAseq data using DESeq2, and have the following design:

     time.point cell.line sensitivity
A_0           0         A           6
B_0           0         B           5
C_0           0         C           3
D_0           0         D           4
E_0           0         E           2
F_0           0         F           1
A_4           4         A           6
B_4           4         B           5
C_4           4         C           3
D_4           4         D           4
E_4           4         E           2
F_4           4         F           1
A_14         14         A           6
B_14         14         B           5
C_14         14         C           3
D_14         14         D           4
E_14         14         E           2
F_14         14         F           1

Basically, this is a time-course RNAseq experiment for a specific drug treatment (t = 0 / 4 / 14), where we have measured sensitivity (1-6) using an independent assay for 6 cell lines (A-F). Is it possible to answer the following questions using DESeq2:

* which genes are differentially expressed as a function of the sensitivity of the cell line when only regarding t = 0?

* which genes are differentially expressed as a function of the sensitivity of the cell line at t = 4/14, correcting for differences at t = 0?

Answer: DESeq2 design formula - correlation with ordered variable
0
Michael Love24k wrote:

When you say “function of the sensitivity” what do you mean exactly. Do you want to treat sensitivity as a numeric? And what kind of functions do you want to use?

I would like to find genes whose expression consistently decreases / increases with increasing sensitivity. Indeed, sensitivity would be treated as a numeric / ordered variable in this case. I am not sure I understand your question about functions, do you mean what kind of functions of DESeq2 I would like to use, or which metric would be good?

You can use any kind of mathematical function, but if you just want to find increases or decreases you can try linear on log gene expression. This is what happens when you include a continuous covariate in the design, e.g. ~sensitivity. For the time series, you can do: ~time + sensitivity + time:sensitivity. This will give two coefficients of interest for the two time points after 0, which you can pull out by ‘name’ using results(). You can of course ignore the message that DESeq2 prints when you include a continuous covariate in the design.

That was exactly what I was looking for, thank you very much! Just out of curiosity, you mention that one can use other mathematical functions. Is there for instance a rank-based method implemented in DESeq2 too?

You can use any function accepted by R's formula() and model.matrix() functions. See ?formula, ?poly, ?ns in the splines package, etc.

For rank based, I would use a series of indicators.