Question: Difference in PCA variance calculation in DESeq2
0
ag1805x20 wrote:

I was trying to plot PCA using DESeq2 plotPCA function and prcomp function. However, the variances I obtained was quite different. Why is this?  Code for PCA using prcomp:

pca <- prcomp(t(countsPC_batch))
percentage <- round(((pca$sdev^2) / (sum(pca$sdev^2))) * 100, 2)
pca_data <- data.frame(pca$x, SampleType=factors_new$SampleType, StudyAccession=factors_new\$StudyAccession)
tiff(filename=paste0("Sample_PCA", OutputNumber, ".tiff"), height=10, width=10, units='in', res=300)
ggplot(pca_data,aes(x=PC1,y=PC2, shape=SampleType, col=StudyAccession )) +
geom_point(size = 4) +
labs(title="Sample PCA", subtitle=paste0("Samples = ", SamplesUsed, " Normalization=", NormalizationUsed))+
xlab(paste0("PC1: ", percentage, "% variance")) +
ylab(paste0("PC2: ", percentage, "% variance")) +
theme(...)
dev.off()


The Proportion of Variance from summary(pca) was consistent to the calculated percentages.

Further, through hierarchical clustering, I observed two major clusters, but in these PCA I think there are three groups.

Answer: Difference in PCA variance calculation in DESeq2
0
Michael Love26k wrote:

Take a look at ?plotPCA which I think will answer your question.

Thank you Mike. So it performs PCA on the top 500 genes by variance.

Can you help me with the second part of the question:

Further, through hierarchical clustering, I observed two major clusters, but in these PCA I think there are three groups.

1

Sure, these are just different techniques at visualizing high dimensional data and they won’t give the “same” answer. Also, there’s a subjective component on top: you are determining by eye where to cut an agglomerative tree and how many groups are in the PCA.