Distribution of transformed data using VST
Entering edit mode
Last seen 12 months ago

Hi Everybody,

I would like to use variance stabilizing transformation on rna seq raw count samples to build a machine learning model in order to predict two classes. I wanted to use linear discriminant analysis, but was not sure if I can assume that the transformed data are distributed normally. 

Based on your experience, what classification model works best on transformed counts?


variancestabilizingtransformation deseq2 • 541 views
Entering edit mode
Last seen 2 days ago
United States

(I added DESeq2 tag, if you don't add a package name then you won't trigger an email to the package maintainer)

The transformed data are what we recommend for working with downstream methods. The transformations are roughly on the log scale which helps to deal with the skew of count data, and are variance stabilized, which, compared to a log(x+1) or something similar, reduces the noise associated with other transformations of small counts obviating the choice of a cutoff for low signal features. Are the transformed counts normally distributed (across samples)? This is certainly not a guarantee of the transformations, but instead the transformation offers the two advantages listed above, which I would expect to generally improve performance of many downstream methods.


Login before adding your answer.

Traffic: 389 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6