DESeq2 PC1 variance on the PCA plot does not match proportion of variance
1
0
Entering edit mode
@akula-nirmala-nihnimh-c-5007
Last seen 4.4 years ago

Hi,

I am using DESeq2 for PCA analysis. The PCA plot generated shows that PC1 variance is 18% and PC2 variance is 7%. When I export all PCs proportion of variance for PC1 is 31.6% and PC2 is 25.8% and so on.

Can someone explain why there is such large difference between the two?

Thank you very much. Nirmala

deseq2 • 2.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

The best way to figure out what's going on here is to check the help page for ?plotPCA:

Note that the source code of plotPCA is very simple. The source can be found...

You can copy the source code of plotPCA into your script and see how we compute percent variance. Note the ntop argument.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thanks for your response. I used ntop=500 for the proportion of variance and all vsd data for PCAplot.

When I change the ntop=21228 then the percent variance for PC1 is 91%.

Any suggestions? Thanks, Nirmala

ADD REPLY
0
Entering edit mode

Suggestions for what? It sounds like you've figured out the discrepancy. It's up to you how many genes to include in the PC analysis.

ADD REPLY
0
Entering edit mode

The information is in the manual pages. Basically, the DESeq2 PCA implementation [by default] selects the top 500 variables based on variance, and then conducts PCA on these. This number of variables is controlled by the ntop parameter. As the PCA transformation is fundamentally based on covariance, the value of ntop will ultimately, therefore, affect the overall explained variation for your derived PCs. It is the same as my own PCA implementation in the PCAtools Bioconductor package.

ADD REPLY
0
Entering edit mode

Hi Michael and Kevin,

No matter how many ntop genes (tried ntop =250, ntop=500, ntop=1000, ntop=5000 and ntop = 20000) I use I cannot get the PC1 variance explained to match the PC1 on the DESeq2 PCA plot. Here's what I get:

ntop=250, PC1=31.6% ntop=500, PC1= 31.6% ntop=1000, PC1=31.9% ntop=5000, PC1=55% ntop=21228 (all genes),PC1=91%

Now for the plot I used plotPCA(vsd, intgroup="condition") command and I get PC1=18%

How can this be explained?

Thanks, Nirmala

ADD REPLY
0
Entering edit mode

You have the exact code in hand that produces the plot, so what’s the issue? Are you running it on the same data?

ADD REPLY
0
Entering edit mode

vsd=varianceStabilizingTransformation(dds.sva,blind=FALSE)
library(genefilter)
rv = rowVars(assay(vsd))
ntop = 500
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,length(rv)))] pca <- prcomp(t(assay(dds)[select, ]))
percentVar <- pca$sdev^2/sum(pca$sdev^2) percentVar [1] 3.169098e-01 2.582494e-01 2.116774e-01 7.875769e-02 4.577093e-02

Here’s the code to generate the plot

plotPCA(vsd, intgroup="condition") data <- plotPCA(vsd, intgroup="condition", returnData=TRUE) dev.off()

ADD REPLY
0
Entering edit mode

You have a bug in your code. Compare your code and mine here:

https://github.com/mikelove/DESeq2/blob/master/R/plots.R#L206

ADD REPLY

Login before adding your answer.

Traffic: 886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6