Question: How to get the regression slope coefficient from DESeq2 analysis of continuous data
gravatar for Edie Crosse
10 days ago by
Edie Crosse0 wrote:


I have used DESeq2 to find genes where the expression level correlates with the size of a continous variable - in this case, the number of cells in a population (pop). We noticed that we have a batch effect in our data depending on what day we processed the experiment (day). So my design set up was as follows:

cData=data.frame(day=as.factor(df$day), pop=df[,x]) rownames(cData)<-colnames(d) dds<-DESeqDataSetFromMatrix(countData=d, colData=cData, design=~day+pop) dds<-DESeq(d.deseq)

The output of the results gives you log2FoldChange which is the "per unit of change of that variable." I am wondering whether there is a way to extract or infer from the data the coefficient of the regression slope? We would like to be able to deduce the size of a cell population from gene expression levels within a tissue sample.

Many thanks! Edie

deseq2 • 63 views
ADD COMMENTlink modified 9 days ago by James W. MacDonald50k • written 10 days ago by Edie Crosse0

MacDonald is right...the value returned is the slope, no matter what the label is. You can verify this yourself by plotting the log2 of the normalized counts against your cell number; the number DESeq gives you should be the slope of that line. I've done that check myself against my own data, and it works out.

ADD REPLYlink modified 9 days ago • written 9 days ago by swbarnes2170
Answer: How to get the regression slope coefficient from DESeq2 analysis of continuous d
gravatar for James W. MacDonald
9 days ago by
United States
James W. MacDonald50k wrote:

The canonical analysis for RNA-Seq is ANOVA, so the default for most software, including DESeq2 is to label the coefficient 'logFC', which is definitely not the 'per unit change of that variable', but instead is the log fold change between groups.

If you use a continuous variable, you still get the coefficient (in this case the regression slope, but still labeled 'logFC'), but now the interpretation is the log change in expression for a unit change in your continuous variable.

ADD COMMENTlink written 9 days ago by James W. MacDonald50k

Thank you very much for your response. We still have some questions regarding how to exctract correlation coefficients that will allow us to predict cell population numbers from gene expression.

1) We need R or R2 (correlation coefficient) and significance of this coefficient? How does this relate to the logFC, which you mentioned is the slope coefficient? If we have a log2FC of 0.02, this gives a logFC of 1.014 - how does this correspond to the correlation coefficient which should be between 0 and 1?

2) Does the adjusted p value shows us how significantly our data fits to the linear regression model?

3) Considering that the linear regression model is Y=B0 + B1 * X (where Y=cell number, B0 is the y intercept, B1 is the slope and X is the gene expression) is there a way to extract B0 from the data? And is B1 corresponding to the logFC?

Your advice is much appreciated.

ADD REPLYlink modified 3 days ago • written 3 days ago by Edie Crosse0

We don’t provide a correlation coefficient in DESeq2. The adjusted pvalue helps you find a set of genes where the FDR is bounded, given the specified model. It’s not a model fit statistic.

You may want to discuss with a statistician

ADD REPLYlink written 3 days ago by Michael Love23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 119 users visited in the last hour