does pca assume no heteroscedasticity?
DESeq2 stabilizes the variance of count data before running PCA. I've read (mostly on sites discussing DESeq2) that PCA assumes no heteroscedasticity. However, I've had trouble finding some math references on why PCA assumes no heteroscedasticity and was wondering if someone could point me to some?


One way to think about it is this: a PCA plot is an effective way to draw samples in 2 dimensions (rather than in ~10,000 dimensions), such that distances between samples are approximately preserved. However, if you directly apply the log transformation to counts, much of the distance between two points is contributed by genes with average read counts say ~1. See the first pair of plots here. The point of first variance stabilizing is to ensure that across the range of mean counts, genes have an equal chance at contributing to the distance metric.

Some more obtuse reading is here on Wikipedia, saying that if the noise is dependent, then the information preserving optimal property of PCA does not hold:


