Search
Question: PCA plot in DESeq2
0
23 months ago by
pkachroo10
pkachroo10 wrote:

Hi,

For RNASeq analysis, I am generating a PCA plot for various strains with three biological replicates each. When I make the PCA plot , I get a symbol on the plot for every replicate. For a large dataset, I was wondering if there is a way to have a single symbol (average of three biological replicates) be represented on the plot, instead of all three replicates.

In DESeq2 package I use:

library(ggplot2)
data <- plotPCA(rld, intgroup=c("clade", "strain"), returnData=TRUE)
percentVar <- round(100 * attr(data, "percentVar"))
ggplot(data, aes(PC1, PC2, color=strain, shape=clade)) +
geom_point(size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed()

Thanks,

Priyanka

modified 23 months ago by Michael Love20k • written 23 months ago by pkachroo10
0
23 months ago by
Michael Love20k
United States
Michael Love20k wrote:

I've received this question before on the support site, and my answer is that I really don't understand the point of a PCA plot in which you can't see how the samples within a group spread out. I suppose you can compare the distances between 3 or more conditions, but those distances relative to the biological variance are what I'm most interested in seeing in a PCA plot.

If you really want to make this plot despite these shortcoming I've mentioned, you can compute the row-wise average of the transformed values for each condition and make a PCA plot of just the means. The rowMeans() function can be used to for the means of a subset of the data, and cbind() can be used to bind the columns of means from the different groups together.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Michael Love20k

Thanks Micheal. I complete agree with your reasoning. However, I have 30 samples in triplicates and visualizing the relationship between samples become difficult due to multiple data points. I intend to make both PCA plots, with individual replicates (to see spread within samples) and with average of replicates (spread between samples).