Question: DESeq2 PCA different from Prcomp PCA
1
gravatar for tiago211287
3.5 years ago by
tiago21128710
Brazil
tiago21128710 wrote:

I made a PCA using the rlog matrix from DESEQ and got this plot where one of my sample groups did not group together.

 plotPCA(rld, intgroup=c("condition"))

http://s3.postimg.org/whlalmp6b/Rplot03.png

Using the same matrix in Prcomp from r the samples get more clustered.

cruzi.pca <- prcomp(rldMat2,

                      center = TRUE,
                      scale. = FALSE) 

library(ggbiplot)

 g <- ggbiplot(pcobj = cruzi.pca, scale = 1, obs.scale = 1, var.scale = 1, 
                groups = groups, ellipse = TRUE, 
                circle = TRUE, var.axes = FALSE)
  g <- g + scale_color_discrete(name = '')
  g <- g + theme(legend.direction = 'horizontal', 
                 legend.position = 'top')
  print(g)
  

http://s4.postimg.org/vdsax2drx/Rplot04.png

How can I decide what plot to use? And Why a same matrix of transformed data got so differently clusted ? Thank you.

deseq2 pca prcomp • 3.0k views
ADD COMMENTlink modified 3.5 years ago by Michael Love25k • written 3.5 years ago by tiago21128710
Answer: DESeq2 PCA different from Prcomp PCA
3
gravatar for Michael Love
3.5 years ago by
Michael Love25k
United States
Michael Love25k wrote:

See ?plotPCA in particular the arguments and the note.

ADD COMMENTlink written 3.5 years ago by Michael Love25k
2

Adding on to Mike's comment, it is most likely due to the number of genes you use for the DESeq2::plotPCA function. This number defaults to 500, while you take all the genes in the rldMat2 object - at least, if rld and rldMat2 are exactly the same objects.

ADD REPLYlink written 3.5 years ago by Federico Marini120

Indeed. This explain the difference.

Why I would make the PCA for only 500 genes instead of all of them ?

ADD REPLYlink written 3.5 years ago by tiago21128710
3

Making a PCA plot after first ranking the genes by total variance helps to make more clear the sample groupings. Of course, you can tune this parameter, but 500 is a good number for many RNA-seq datasets.

ADD REPLYlink written 3.5 years ago by Michael Love25k

Thanks for the clarifications Michael.

ADD REPLYlink written 3.5 years ago by tiago21128710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 297 users visited in the last hour