Question: PCA plot in DESeq2
0
pkachroo10 wrote:

Hi,

For RNASeq analysis, I am generating a PCA plot for various strains with three biological replicates each. When I make the PCA plot , I get a symbol on the plot for every replicate. For a large dataset, I was wondering if there is a way to have a single symbol (average of three biological replicates) be represented on the plot, instead of all three replicates.

In DESeq2 package I use:

library(ggplot2)
data <- plotPCA(rld, intgroup=c("clade", "strain"), returnData=TRUE)
percentVar <- round(100 * attr(data, "percentVar"))
ggplot(data, aes(PC1, PC2, color=strain, shape=clade)) +
geom_point(size=3) +
xlab(paste0("PC1: ",percentVar,"% variance")) +
ylab(paste0("PC2: ",percentVar,"% variance")) +
coord_fixed()

Thanks,

Priyanka

deseq2 pca plot • 1.7k views  modified 2.8 years ago by Michael Love26k • written 2.8 years ago by pkachroo10
0
Michael Love26k wrote:

I've received this question before on the support site, and my answer is that I really don't understand the point of a PCA plot in which you can't see how the samples within a group spread out. I suppose you can compare the distances between 3 or more conditions, but those distances relative to the biological variance are what I'm most interested in seeing in a PCA plot.

If you really want to make this plot despite these shortcoming I've mentioned, you can compute the row-wise average of the transformed values for each condition and make a PCA plot of just the means. The rowMeans() function can be used to for the means of a subset of the data, and cbind() can be used to bind the columns of means from the different groups together.