Top variable features used by 'runPCA' in scater
1
0
Entering edit mode
jws • 0
@jws-18804
Last seen 6.0 years ago

As the scater vignette (https://bioconductor.org/packages/devel/bioc/vignettes/scater/inst/doc/vignette-dataviz.html#generating-pca-plots) describes, by default, runPCA performs PCA on the log-counts using the 500 features with the most variable expression across all cells.

I am wondering how the most variable expression is determined, and how the names of features (genes) can be extracted. Thanks!

scater pca features • 1.4k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 14 hours ago
The city by the bay

It's pretty literal. The top 500 genes with the largest variance of the log-counts are used - and that's it. You can get them by doing:

vars <- DelayedMatrixStats::rowVars(logcounts(sce))
head(order(vars, decreasing=TRUE), 500)

There's no consideration of the mean-variance trend or of technical components of variance or anything like that. If you want something more sophisticated, check out trendVar and decomposeVar (or possibly technicalCV2 and improvedCV2) in scran.

ADD COMMENT

Login before adding your answer.

Traffic: 1062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6