I am working with a set of genes over 20 samples therefore I have been using the DESeq2 package. When plotting a PCA (after normalising using vat etc...) I would like to draw the ellipses on a high confidence level, and justify the clustering, AT the moment it works very good for t-distribution and normal distribution. However, I am unsure how is my data distributed (normal, t-distribution?) - how can I find out?
Another way is to use Euclidean distance and decrease the confidence level. But is this recommended?
This is more a text question rather than code question, therefore not expecting answers replicating code, but here is what I have used.
ggplot(pcaData, aes(x = PC1, y = PC2, color = A, shape = B)) + geom_point(size =3) + scale_color_gradientn(colours = rainbow(10)) + xlab(paste0("PC1: ", percentVar, "% variance")) + ylab(paste0("PC2: ", percentVar, "% variance")) + coord_fixed() + stat_ellipse(type = "euclid", lty=2, col=1)