PCA on deseq2 gene data set
sally_b86
Last seen 2.9 years ago


Just a quick question please I plotted PCA plot of gene expression data in DESeq2 via the genespca function in pcaexplorer package, I got 2 subsets distributed in the second dimension, once checking the genes I found the 2 subsets represent up and down regulated genes. My question is what does PC1 represent. I was comparing a test to control only with 8 replicates each.

Thank you in advance.

pcaexplorer genespca deseq2
Last seen 1 hour ago
United States

Wikipedia has a paragraph on the intuition behind PCA:


PC1, or the first component, represents the "direction" or "axis" in the space of the genes (by default we look at the top 500 genes with most variance of transformed counts) that captures most of the variance among samples.

So you can imagine, if you have a block of 10 genes that show very large DE changes across condition, and this is a big difference relative to the variance across samples for other genes in the experiment, then these 10 genes would have a big contribution to PC1, and PC1 would show separation of the samples by condition.


