Question: Top variable features used by 'runPCA' in scater
0
gravatar for jws
5 weeks ago by
jws0
jws0 wrote:

As the scater vignette (https://bioconductor.org/packages/devel/bioc/vignettes/scater/inst/doc/vignette-dataviz.html#generating-pca-plots) describes, by default, runPCA performs PCA on the log-counts using the 500 features with the most variable expression across all cells.

I am wondering how the most variable expression is determined, and how the names of features (genes) can be extracted. Thanks!

pca features scater • 91 views
ADD COMMENTlink modified 5 weeks ago by Aaron Lun22k • written 5 weeks ago by jws0
Answer: Top variable features used by 'runPCA' in scater
2
gravatar for Aaron Lun
5 weeks ago by
Aaron Lun22k
Cambridge, United Kingdom
Aaron Lun22k wrote:

It's pretty literal. The top 500 genes with the largest variance of the log-counts are used - and that's it. You can get them by doing:

vars <- DelayedMatrixStats::rowVars(logcounts(sce))
head(order(vars, decreasing=TRUE), 500)

There's no consideration of the mean-variance trend or of technical components of variance or anything like that. If you want something more sophisticated, check out trendVar and decomposeVar (or possibly technicalCV2 and improvedCV2) in scran.

ADD COMMENTlink written 5 weeks ago by Aaron Lun22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 123 users visited in the last hour