Question: Distribution of transformed data using VST
0
gravatar for lirongrossmann
2.0 years ago by
lirongrossmann40 wrote:

Hi Everybody,

I would like to use variance stabilizing transformation on rna seq raw count samples to build a machine learning model in order to predict two classes. I wanted to use linear discriminant analysis, but was not sure if I can assume that the transformed data are distributed normally. 

Based on your experience, what classification model works best on transformed counts?

Thanks

ADD COMMENTlink modified 2.0 years ago by Michael Love26k • written 2.0 years ago by lirongrossmann40
Answer: Distribution of transformed data using VST
0
gravatar for Michael Love
2.0 years ago by
Michael Love26k
United States
Michael Love26k wrote:

(I added DESeq2 tag, if you don't add a package name then you won't trigger an email to the package maintainer)

The transformed data are what we recommend for working with downstream methods. The transformations are roughly on the log scale which helps to deal with the skew of count data, and are variance stabilized, which, compared to a log(x+1) or something similar, reduces the noise associated with other transformations of small counts obviating the choice of a cutoff for low signal features. Are the transformed counts normally distributed (across samples)? This is certainly not a guarantee of the transformations, but instead the transformation offers the two advantages listed above, which I would expect to generally improve performance of many downstream methods.

ADD COMMENTlink written 2.0 years ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour