Question

limma KNN classification with voom-transformed data

0

Entering edit mode

andreas.scherer • 0

@andreasscherer-11498

Last seen 2.4 years ago

Finland

Hello, I am analysing RNAseq data with limma. As input for KNN classification, should I use log2cpm data or voom transformed data? Thank you so much for your consideration.

limma KNN • 642 views

ADD COMMENT • link updated 2.4 years ago by Steve Lianoglou ★ 13k • written 2.4 years ago by andreas.scherer • 0

score 1 · Answer 1 · 2021-12-09

For all intents and purposes, unless you are using a method that can take advantage of the observation weights that come out of voom(), "voom transformed data" is essentially just "log2cpm" with a small prior count (0.5).

The problem with that is that you will have more variance around the lower expression values of your log2cpm data with such a small prior count, but your downstream analyses tools will likely expect data to more homoscedastic. This is OK for voom, because the weights are incorporated in the analysis, but they are likely not in your KNN procedure, or whatever else you want to throw at it.

As you call edgeR::cpm(y, log = TRUE, prior.count = N) with larger and larger values of N you will "hammer out" more and more the variance at the low end of expression, and you will find that it is often suggested on this support form to use a value for prior.count between 3 and 5 to get your data "approximately" where you want it to be prior to feeding it into some clustering, pca, or whatever else algorithm you choose to run -- so you should prefer to use this approach as opposed to the "voomed" $E matrix.

Another approach is to use the output from the vst (variance stabilization transform) method found in the DESeq2 package to do the same. Perhaps you can think of the vst transformation in DESeq2 as similar to the edgeR::cpm(y, log = TRUE, prior.count = N) but the value of N isn't constant throughout, which is to say that its value adapts in some smart way within the vst procedure itself.