Question: deseq2 normalized data
0
gravatar for akp
3.7 years ago by
akp0
akp0 wrote:

I understand the idea of using negative bionomial distribution to test whether a covariate is differentially expressed/abundant or not.

I wonder, if the same argument is valid, when the analysis is not performing any test but for example, regressing these genes over case/control. In this regression, one continue and use relative abundance or should still use say the variantestablizer ...

regression deseq2 counts • 894 views
ADD COMMENTlink modified 3.7 years ago by Michael Love23k • written 3.7 years ago by akp0

I think you will need to be more specific about what you mean by "regression analysis".

ADD REPLYlink written 3.7 years ago by Ryan C. Thompson7.3k

regressing genes on the outcome ( case/control ).
 

ADD REPLYlink written 3.7 years ago by akp0
Answer: deseq2 normalized data
1
gravatar for Michael Love
3.7 years ago by
Michael Love23k
United States
Michael Love23k wrote:

"when the analysis is not performing any test but for example, regressing these genes over case/control. In this regression, one continue and use relative abundance or should still use say the variantestablizer ..."

Sorry, this is not clear enough for me to give an answer. The GLM is in fact very similar to a regression of the expected value for the normalized counts on the log scale over the case/control status. Can you restate the question in a more specific way as to your aims?

ADD COMMENTlink written 3.7 years ago by Michael Love23k

I am going to use a predictive model, to classify cancer / non-cancer. You can think, of it as a logistic regression; and eventually, my models returns some coefficient for every covariates(genes); Then if a new data comes, based on those coefficients I can assign new data points into either classes.

Typically, in this type of regression analysis, we standardize/rescale via "(x - mean(x))/sd(x)"; I wonder, if one should use DESEQ2 normalized data and skip "(x-mean(x))/sd(x)" or the other way around ?

ADD REPLYlink written 3.7 years ago by akp0

I would recommend variance stabilizing using VST or rlog and not dividing out the row (gene) standard deviation*

* see this explanation: A: Biclustering Normalizing by Row in Heatmap of DESeq2

With the variance stabilized data, you can then perform any kind of machine learning or prediction algorithms your like. 

ADD REPLYlink written 3.7 years ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 254 users visited in the last hour