Question: Difference in PCA variance calculation in DESeq2
0
gravatar for ag1805x
4 weeks ago by
ag1805x20
University of Allahabad
ag1805x20 wrote:

I was trying to plot PCA using DESeq2 plotPCA function and prcomp function. However, the variances I obtained was quite different. Why is this?

prcomp PCA

DESeq2 PCA

Code for PCA using prcomp:

pca <- prcomp(t(countsPC_batch))
percentage <- round(((pca$sdev^2) / (sum(pca$sdev^2))) * 100, 2)
pca_data <- data.frame(pca$x, SampleType=factors_new$SampleType, StudyAccession=factors_new$StudyAccession)
tiff(filename=paste0("Sample_PCA", OutputNumber, ".tiff"), height=10, width=10, units='in', res=300)
ggplot(pca_data,aes(x=PC1,y=PC2, shape=SampleType, col=StudyAccession )) +
 geom_point(size = 4) +
 labs(title="Sample PCA", subtitle=paste0("Samples = ", SamplesUsed, " Normalization=", NormalizationUsed))+
 xlab(paste0("PC1: ", percentage[1], "% variance")) +
 ylab(paste0("PC2: ", percentage[2], "% variance")) +
 theme(...)
dev.off()

The Proportion of Variance from summary(pca) was consistent to the calculated percentages.

Further, through hierarchical clustering, I observed two major clusters, but in these PCA I think there are three groups.

deseq2 hclust pca • 75 views
ADD COMMENTlink modified 4 weeks ago by Michael Love26k • written 4 weeks ago by ag1805x20
Answer: Difference in PCA variance calculation in DESeq2
0
gravatar for Michael Love
4 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

Take a look at ?plotPCA which I think will answer your question.

ADD COMMENTlink written 4 weeks ago by Michael Love26k

Thank you Mike. So it performs PCA on the top 500 genes by variance.

Can you help me with the second part of the question:

Further, through hierarchical clustering, I observed two major clusters, but in these PCA I think there are three groups.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ag1805x20
1

Sure, these are just different techniques at visualizing high dimensional data and they won’t give the “same” answer. Also, there’s a subjective component on top: you are determining by eye where to cut an agglomerative tree and how many groups are in the PCA.

ADD REPLYlink written 4 weeks ago by Michael Love26k

Actually my aim was to see if after batch effect removal the samples clustered as desired according to the two sample types. As can be seen from the hclust results and from PCA 63% variance is explained by PC1. So I guess the job has been correctly done.

ADD REPLYlink written 4 weeks ago by ag1805x20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 454 users visited in the last hour