Distribution of transformed data using VST
1
0
Entering edit mode
@lirongrossmann-13938
Last seen 4.2 years ago

Hi Everybody,

I would like to use variance stabilizing transformation on rna seq raw count samples to build a machine learning model in order to predict two classes. I wanted to use linear discriminant analysis, but was not sure if I can assume that the transformed data are distributed normally. 

Based on your experience, what classification model works best on transformed counts?

Thanks

variancestabilizingtransformation deseq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 days ago
United States

(I added DESeq2 tag, if you don't add a package name then you won't trigger an email to the package maintainer)

The transformed data are what we recommend for working with downstream methods. The transformations are roughly on the log scale which helps to deal with the skew of count data, and are variance stabilized, which, compared to a log(x+1) or something similar, reduces the noise associated with other transformations of small counts obviating the choice of a cutoff for low signal features. Are the transformed counts normally distributed (across samples)? This is certainly not a guarantee of the transformations, but instead the transformation offers the two advantages listed above, which I would expect to generally improve performance of many downstream methods.

ADD COMMENT

Login before adding your answer.

Traffic: 344 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6