Question: Distribution of transformed data using VST
gravatar for lirongrossmann
4 weeks ago by
lirongrossmann0 wrote:

Hi Everybody,

I would like to use variance stabilizing transformation on rna seq raw count samples to build a machine learning model in order to predict two classes. I wanted to use linear discriminant analysis, but was not sure if I can assume that the transformed data are distributed normally. 

Based on your experience, what classification model works best on transformed counts?


ADD COMMENTlink modified 29 days ago by Michael Love15k • written 4 weeks ago by lirongrossmann0
gravatar for Michael Love
29 days ago by
Michael Love15k
United States
Michael Love15k wrote:

(I added DESeq2 tag, if you don't add a package name then you won't trigger an email to the package maintainer)

The transformed data are what we recommend for working with downstream methods. The transformations are roughly on the log scale which helps to deal with the skew of count data, and are variance stabilized, which, compared to a log(x+1) or something similar, reduces the noise associated with other transformations of small counts obviating the choice of a cutoff for low signal features. Are the transformed counts normally distributed (across samples)? This is certainly not a guarantee of the transformations, but instead the transformation offers the two advantages listed above, which I would expect to generally improve performance of many downstream methods.

ADD COMMENTlink written 29 days ago by Michael Love15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 421 users visited in the last hour