Question: Distribution of transformed data using VST
gravatar for lirongrossmann
6 months ago by
lirongrossmann10 wrote:

Hi Everybody,

I would like to use variance stabilizing transformation on rna seq raw count samples to build a machine learning model in order to predict two classes. I wanted to use linear discriminant analysis, but was not sure if I can assume that the transformed data are distributed normally. 

Based on your experience, what classification model works best on transformed counts?


ADD COMMENTlink modified 6 months ago by Michael Love17k • written 6 months ago by lirongrossmann10
gravatar for Michael Love
6 months ago by
Michael Love17k
United States
Michael Love17k wrote:

(I added DESeq2 tag, if you don't add a package name then you won't trigger an email to the package maintainer)

The transformed data are what we recommend for working with downstream methods. The transformations are roughly on the log scale which helps to deal with the skew of count data, and are variance stabilized, which, compared to a log(x+1) or something similar, reduces the noise associated with other transformations of small counts obviating the choice of a cutoff for low signal features. Are the transformed counts normally distributed (across samples)? This is certainly not a guarantee of the transformations, but instead the transformation offers the two advantages listed above, which I would expect to generally improve performance of many downstream methods.

ADD COMMENTlink written 6 months ago by Michael Love17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 188 users visited in the last hour