I stabilize my data with rlog and then plot PCA with the DESeq proposed method.
dds = DESeqDataSetFromMatrix(countData=histone_m, colData=Design, design=~condition)
cds=estimateSizeFactors(dds)
met = rlog(dds)
data <- plotPCA(met, intgroup=c("condition"), returnData=TRUE)
percentVar <- round(100 * attr(data, "percentVar"))
myPlot = ggplot(data, aes(PC1, PC2, color=name)) +
geom_point(size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance"))
Then I decided to do a normal pca from r:
met =assay(met)
pca<- prcomp(t(met))
screeplot_percent(pca)
col=c("red","pink","black","blue")
idx <- seq_len(3)
print(splom(pca$x[,idx], col=col,pch=19))
Function for screeplot:
screeplot_percent <- function(x, npcs = min(10, length(x$sdev)), ...) {
idx <- seq_len(npcs)
sum_var <- sum(x$sdev ^ 2)
vars <- 100 * (x$sdev[idx] ^ 2 / sum_var)
cumvar <- cumsum(vars)
barplot(vars, width = 0.9, space = 0.1, names.arg = idx, ylim = c(0, 100),
xlab = "Principal Component", ylab = "Percent Variance",
xaxp = c(1, npcs, npcs - 1), las = 1)
lines(x = idx - 0.5, y = cumvar, type = "b", lty = 2)
legend("bottomright", legend = c("Proportion", "Cumulative"), lty = c(1, 2),
pch = c(19, 1))
}
red -> WEN1
pink -> WEN3
black -> WNN1
blue -> WNN3
In fact PCA plots differ a bit and screeplot differ a lot. However, I do not understand why.
As you see WEN3 and WNN3 are quite apart on the second plot in comparison to the first plot. Additionally, the scale on the y-axis and x-axis is different. What is the reason for this?
Also my screeplot tells me that PC1 explains app. 75% of variance whereas ggplot claims 87%.