does pca assume no heteroscedasticity?
Entering edit mode
Jake ▴ 90
Last seen 9 months ago
United States

DESeq2 stabilizes the variance of count data before running PCA. I've read (mostly on sites discussing DESeq2) that PCA assumes no heteroscedasticity. However, I've had trouble finding some math references on why PCA assumes no heteroscedasticity and was wondering if someone could point me to some?


deseq2 pca • 1.3k views
Entering edit mode
Last seen 12 hours ago
United States

One way to think about it is this: a PCA plot is an effective way to draw samples in 2 dimensions (rather than in ~10,000 dimensions), such that distances between samples are approximately preserved. However, if you directly apply the log transformation to counts, much of the distance between two points is contributed by genes with average read counts say ~1. See the first pair of plots here. The point of first variance stabilizing is to ensure that across the range of mean counts, genes have an equal chance at contributing to the distance metric.

Some more obtuse reading is here on Wikipedia, saying that if the noise is dependent, then the information preserving optimal property of PCA does not hold:


Login before adding your answer.

Traffic: 392 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6