Question: Top variable features used by 'runPCA' in scater
0
gravatar for jws
6 months ago by
jws0
jws0 wrote:

As the scater vignette (https://bioconductor.org/packages/devel/bioc/vignettes/scater/inst/doc/vignette-dataviz.html#generating-pca-plots) describes, by default, runPCA performs PCA on the log-counts using the 500 features with the most variable expression across all cells.

I am wondering how the most variable expression is determined, and how the names of features (genes) can be extracted. Thanks!

pca scater features • 185 views
ADD COMMENTlink modified 6 months ago by Aaron Lun24k • written 6 months ago by jws0
Answer: Top variable features used by 'runPCA' in scater
2
gravatar for Aaron Lun
6 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

It's pretty literal. The top 500 genes with the largest variance of the log-counts are used - and that's it. You can get them by doing:

vars <- DelayedMatrixStats::rowVars(logcounts(sce))
head(order(vars, decreasing=TRUE), 500)

There's no consideration of the mean-variance trend or of technical components of variance or anything like that. If you want something more sophisticated, check out trendVar and decomposeVar (or possibly technicalCV2 and improvedCV2) in scran.

ADD COMMENTlink written 6 months ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 198 users visited in the last hour