How to get the regression slope coefficient from DESeq2 analysis of continuous data
1
0
Entering edit mode
@edie-crosse-20807
Last seen 4.9 years ago
Edinburgh

Hello,

I have used DESeq2 to find genes where the expression level correlates with the size of a continous variable - in this case, the number of cells in a population (pop). We noticed that we have a batch effect in our data depending on what day we processed the experiment (day). So my design set up was as follows:

cData=data.frame(day=as.factor(df$day), pop=df[,x]) rownames(cData)<-colnames(d) dds<-DESeqDataSetFromMatrix(countData=d, colData=cData, design=~day+pop) dds<-DESeq(d.deseq)

The output of the results gives you log2FoldChange which is the "per unit of change of that variable." I am wondering whether there is a way to extract or infer from the data the coefficient of the regression slope? We would like to be able to deduce the size of a cell population from gene expression levels within a tissue sample.

Many thanks! Edie

deseq2 • 1.9k views
ADD COMMENT
0
Entering edit mode

MacDonald is right...the value returned is the slope, no matter what the label is. You can verify this yourself by plotting the log2 of the normalized counts against your cell number; the number DESeq gives you should be the slope of that line. I've done that check myself against my own data, and it works out.

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

The canonical analysis for RNA-Seq is ANOVA, so the default for most software, including DESeq2 is to label the coefficient 'logFC', which is definitely not the 'per unit change of that variable', but instead is the log fold change between groups.

If you use a continuous variable, you still get the coefficient (in this case the regression slope, but still labeled 'logFC'), but now the interpretation is the log change in expression for a unit change in your continuous variable.

ADD COMMENT
0
Entering edit mode

Thank you very much for your response. We still have some questions regarding how to exctract correlation coefficients that will allow us to predict cell population numbers from gene expression.

1) We need R or R2 (correlation coefficient) and significance of this coefficient? How does this relate to the logFC, which you mentioned is the slope coefficient? If we have a log2FC of 0.02, this gives a logFC of 1.014 - how does this correspond to the correlation coefficient which should be between 0 and 1?

2) Does the adjusted p value shows us how significantly our data fits to the linear regression model?

3) Considering that the linear regression model is Y=B0 + B1 * X (where Y=cell number, B0 is the y intercept, B1 is the slope and X is the gene expression) is there a way to extract B0 from the data? And is B1 corresponding to the logFC?

Your advice is much appreciated.

ADD REPLY
0
Entering edit mode

We don’t provide a correlation coefficient in DESeq2. The adjusted pvalue helps you find a set of genes where the FDR is bounded, given the specified model. It’s not a model fit statistic.

You may want to discuss with a statistician

ADD REPLY

Login before adding your answer.

Traffic: 694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6