Using voom-transformed counts for unsupervised analyses (PCA, Random Forest, Elastic Net): Best practices?
1
0
Entering edit mode
Ahdee ▴ 50
@ahdee-8938
Last seen 1 hour ago
United States

I'm using voom transformation (limma package) for RNA-seq analysis and considering using the voom-transformed counts (vs CPM; log2(CPM) for downstream analyses like PCA, Random Forest, and Elastic Net. The homoscedastic property of voom transformation seems advantageous for these methods however I'm not sure what if this is advisable? Moreover, if so, then I'm wondering about best practices - specifically, should the voom transformation be performed without a design matrix for these unsupervised analyses to avoid potential bias?

thanks in advance!

R version 4.1.2 (2021-11-01)
limma_3.50.3
limma limma-voom • 43 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

The 'voom transformed counts' are just logCPM with a prior count of 0.5. The 'magic' of limma-voom is the weights that are computed and then used in a weighted linear regression to account for the heteroscedasticity. In other words, the observational weights are used in a linear regression (with the logCPM values as the outcome) in order to remove heteroscedasticity of the model residuals. But it appears you want to use the gene expression data as predictors, not outcomes, in which case I don't think the weights are going to be helpful (model weights apply to the outcome, not the predictors).

An alternative I have used in the past when doing WGCNA, where you want the information provided by each gene to be somewhat equivalent, was to use the cqn package to generate GC-bias and length adjusted RPKM values, which will then hypothetically provide 'purer' gene expression values.

Login before adding your answer.

Traffic: 558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6