Why bother using the DESEQ functions? It is better to just use prcomp function in R for your PCA plot and ggplot2 - it is more customisable (see example below). Again, for clustering you don't have to use DESEQ, read the NMF aheatmap manual for example, or there are other options - again which would be easier to work with than inbuilt DESEQ functions that are wrapped up in so much other stuff it is harder to customise them.
pca1 = prcomp(data2)
scores <- data.frame(pca1$x)
scores <- cbind(scores, factor(names))
colnames(scores)[ncol(scores)] <- 'type'
myColors <- brewer.pal(9,"Set3")
myColors <- sample(myColors)
names(myColors) <- levels(scores$type)
colScale <- scale_colour_manual(name = "type",values = myColors)
row.names(scores) <- des4$ID
ggplot(data = scores, aes(x = PC1, y = PC2, colour = type, label = rownames(scores))) +
geom_point(size = 5) +
theme_bw() +
theme(axis.title=element_text(size=14,face="bold"),
axis.text=element_text(size=14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.title=element_blank(),
legend.text = element_text(size = 14)) + colScale)
The source code for plotPCA is simple and easy to customize, we say so much in the help page ?plotPCA. It is commented to explain what we are doing in each step.
Here's the source:
https://github.com/Bioconductor-mirror/DESeq2/blob/master/R/plots.R#L162-L201
That's just a more complicated version of what I posted. It is better to go the the original packages yourself and just do it that way. That is just my view anyway.
Fair enough. That's just like, your opinion, man :)
I'll just say that the selection of rows by highest variance makes a big difference, helps to "bring into focus" the sample clusters.
Also I find that the annotation of percent variance on the axes is useful in assessing what is being shown, answering the question: is this basically all of the sample-sample variabilty being shown, or is the (PC1,PC2) projection showing very little of total variance, because the scree plot is fairly flat.