Question

Using voom-transformed counts for unsupervised analyses (PCA, Random Forest, Elastic Net): Best practices?

0

Entering edit mode

Ahdee ▴ 50

@ahdee-8938

Last seen 1 hour ago

United States

I'm using voom transformation (limma package) for RNA-seq analysis and considering using the voom-transformed counts (vs CPM; log2(CPM) for downstream analyses like PCA, Random Forest, and Elastic Net. The homoscedastic property of voom transformation seems advantageous for these methods however I'm not sure what if this is advisable? Moreover, if so, then I'm wondering about best practices - specifically, should the voom transformation be performed without a design matrix for these unsupervised analyses to avoid potential bias?

thanks in advance!

R version 4.1.2 (2021-11-01)
limma_3.50.3

limma limma-voom • 43 views

ADD COMMENT • link updated 2 hours ago by James W. MacDonald 67k • written 3 hours ago by Ahdee ▴ 50

score 1 · Answer 1 · 2024-11-18

The 'voom transformed counts' are just logCPM with a prior count of 0.5. The 'magic' of limma-voom is the weights that are computed and then used in a weighted linear regression to account for the heteroscedasticity. In other words, the observational weights are used in a linear regression (with the logCPM values as the outcome) in order to remove heteroscedasticity of the model residuals. But it appears you want to use the gene expression data as predictors, not outcomes, in which case I don't think the weights are going to be helpful (model weights apply to the outcome, not the predictors).

An alternative I have used in the past when doing WGCNA, where you want the information provided by each gene to be somewhat equivalent, was to use the cqn package to generate GC-bias and length adjusted RPKM values, which will then hypothetically provide 'purer' gene expression values.