does pca assume no heteroscedasticity?
1
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 20 months ago
United States

DESeq2 stabilizes the variance of count data before running PCA. I've read (mostly on sites discussing DESeq2) that PCA assumes no heteroscedasticity. However, I've had trouble finding some math references on why PCA assumes no heteroscedasticity and was wondering if someone could point me to some?

Thanks

deseq2 pca • 2.0k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

One way to think about it is this: a PCA plot is an effective way to draw samples in 2 dimensions (rather than in ~10,000 dimensions), such that distances between samples are approximately preserved. However, if you directly apply the log transformation to counts, much of the distance between two points is contributed by genes with average read counts say ~1. See the first pair of plots here. The point of first variance stabilizing is to ensure that across the range of mean counts, genes have an equal chance at contributing to the distance metric.

Some more obtuse reading is here on Wikipedia, saying that if the noise is dependent, then the information preserving optimal property of PCA does not hold:

https://en.wikipedia.org/wiki/Principal_component_analysis#PCA_and_information_theory

ADD COMMENT

Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6