Question: How to get the regression slope coefficient from DESeq2 analysis of continuous data
gravatar for Edie Crosse
9 weeks ago by
Edie Crosse0 wrote:


I have used DESeq2 to find genes where the expression level correlates with the size of a continous variable - in this case, the number of cells in a population (pop). We noticed that we have a batch effect in our data depending on what day we processed the experiment (day). So my design set up was as follows:

cData=data.frame(day=as.factor(df$day), pop=df[,x]) rownames(cData)<-colnames(d) dds<-DESeqDataSetFromMatrix(countData=d, colData=cData, design=~day+pop) dds<-DESeq(d.deseq)

The output of the results gives you log2FoldChange which is the "per unit of change of that variable." I am wondering whether there is a way to extract or infer from the data the coefficient of the regression slope? We would like to be able to deduce the size of a cell population from gene expression levels within a tissue sample.

Many thanks! Edie

deseq2 • 114 views
ADD COMMENTlink modified 9 weeks ago by James W. MacDonald50k • written 9 weeks ago by Edie Crosse0

MacDonald is right...the value returned is the slope, no matter what the label is. You can verify this yourself by plotting the log2 of the normalized counts against your cell number; the number DESeq gives you should be the slope of that line. I've done that check myself against my own data, and it works out.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by swbarnes2220
Answer: How to get the regression slope coefficient from DESeq2 analysis of continuous d
gravatar for James W. MacDonald
9 weeks ago by
United States
James W. MacDonald50k wrote:

The canonical analysis for RNA-Seq is ANOVA, so the default for most software, including DESeq2 is to label the coefficient 'logFC', which is definitely not the 'per unit change of that variable', but instead is the log fold change between groups.

If you use a continuous variable, you still get the coefficient (in this case the regression slope, but still labeled 'logFC'), but now the interpretation is the log change in expression for a unit change in your continuous variable.

ADD COMMENTlink written 9 weeks ago by James W. MacDonald50k

Thank you very much for your response. We still have some questions regarding how to exctract correlation coefficients that will allow us to predict cell population numbers from gene expression.

1) We need R or R2 (correlation coefficient) and significance of this coefficient? How does this relate to the logFC, which you mentioned is the slope coefficient? If we have a log2FC of 0.02, this gives a logFC of 1.014 - how does this correspond to the correlation coefficient which should be between 0 and 1?

2) Does the adjusted p value shows us how significantly our data fits to the linear regression model?

3) Considering that the linear regression model is Y=B0 + B1 * X (where Y=cell number, B0 is the y intercept, B1 is the slope and X is the gene expression) is there a way to extract B0 from the data? And is B1 corresponding to the logFC?

Your advice is much appreciated.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Edie Crosse0

We don’t provide a correlation coefficient in DESeq2. The adjusted pvalue helps you find a set of genes where the FDR is bounded, given the specified model. It’s not a model fit statistic.

You may want to discuss with a statistician

ADD REPLYlink written 8 weeks ago by Michael Love24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 306 users visited in the last hour