How to get the regression slope coefficient from DESeq2 analysis of continuous data
Entering edit mode
Last seen 5.0 years ago


I have used DESeq2 to find genes where the expression level correlates with the size of a continous variable - in this case, the number of cells in a population (pop). We noticed that we have a batch effect in our data depending on what day we processed the experiment (day). So my design set up was as follows:

cData=data.frame(day=as.factor(df$day), pop=df[,x]) rownames(cData)<-colnames(d) dds<-DESeqDataSetFromMatrix(countData=d, colData=cData, design=~day+pop) dds<-DESeq(d.deseq)

The output of the results gives you log2FoldChange which is the "per unit of change of that variable." I am wondering whether there is a way to extract or infer from the data the coefficient of the regression slope? We would like to be able to deduce the size of a cell population from gene expression levels within a tissue sample.

Many thanks! Edie

deseq2 • 2.0k views
Entering edit mode

MacDonald is right...the value returned is the slope, no matter what the label is. You can verify this yourself by plotting the log2 of the normalized counts against your cell number; the number DESeq gives you should be the slope of that line. I've done that check myself against my own data, and it works out.

Entering edit mode
Last seen 7 hours ago
United States

The canonical analysis for RNA-Seq is ANOVA, so the default for most software, including DESeq2 is to label the coefficient 'logFC', which is definitely not the 'per unit change of that variable', but instead is the log fold change between groups.

If you use a continuous variable, you still get the coefficient (in this case the regression slope, but still labeled 'logFC'), but now the interpretation is the log change in expression for a unit change in your continuous variable.

Entering edit mode

Thank you very much for your response. We still have some questions regarding how to exctract correlation coefficients that will allow us to predict cell population numbers from gene expression.

1) We need R or R2 (correlation coefficient) and significance of this coefficient? How does this relate to the logFC, which you mentioned is the slope coefficient? If we have a log2FC of 0.02, this gives a logFC of 1.014 - how does this correspond to the correlation coefficient which should be between 0 and 1?

2) Does the adjusted p value shows us how significantly our data fits to the linear regression model?

3) Considering that the linear regression model is Y=B0 + B1 * X (where Y=cell number, B0 is the y intercept, B1 is the slope and X is the gene expression) is there a way to extract B0 from the data? And is B1 corresponding to the logFC?

Your advice is much appreciated.

Entering edit mode

We don’t provide a correlation coefficient in DESeq2. The adjusted pvalue helps you find a set of genes where the FDR is bounded, given the specified model. It’s not a model fit statistic.

You may want to discuss with a statistician


Login before adding your answer.

Traffic: 399 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6