Entering edit mode
@akula-nirmala-nihnimh-c-5007
Last seen 5.3 years ago
Hi,
I am using DESeq2 for PCA analysis. The PCA plot generated shows that PC1 variance is 18% and PC2 variance is 7%. When I export all PCs proportion of variance for PC1 is 31.6% and PC2 is 25.8% and so on.
Can someone explain why there is such large difference between the two?
Thank you very much. Nirmala
Hi Michael,
Thanks for your response. I used ntop=500 for the proportion of variance and all vsd data for PCAplot.
When I change the ntop=21228 then the percent variance for PC1 is 91%.
Any suggestions? Thanks, Nirmala
Suggestions for what? It sounds like you've figured out the discrepancy. It's up to you how many genes to include in the PC analysis.
The information is in the manual pages. Basically, the DESeq2 PCA implementation [by default] selects the top 500 variables based on variance, and then conducts PCA on these. This number of variables is controlled by the
ntop
parameter. As the PCA transformation is fundamentally based on covariance, the value ofntop
will ultimately, therefore, affect the overall explained variation for your derived PCs. It is the same as my own PCA implementation in the PCAtools Bioconductor package.Hi Michael and Kevin,
No matter how many ntop genes (tried ntop =250, ntop=500, ntop=1000, ntop=5000 and ntop = 20000) I use I cannot get the PC1 variance explained to match the PC1 on the DESeq2 PCA plot. Here's what I get:
ntop=250, PC1=31.6% ntop=500, PC1= 31.6% ntop=1000, PC1=31.9% ntop=5000, PC1=55% ntop=21228 (all genes),PC1=91%
Now for the plot I used plotPCA(vsd, intgroup="condition") command and I get PC1=18%
How can this be explained?
Thanks, Nirmala
You have the exact code in hand that produces the plot, so what’s the issue? Are you running it on the same data?
Here’s the code to generate the plot
You have a bug in your code. Compare your code and mine here:
https://github.com/mikelove/DESeq2/blob/master/R/plots.R#L206