Hi,
I used to plot PCA using DESeq2, and it works great. In DESeq2, the normalized counts are transformed through vst(variance stabilized transformation, based on the NB variance ~ expectation relationships?) function, rather than the direct log transformed counts.
I tried limma-voom in my new datasets, which has more than 500 samples with various factors(sex, Batches, treatments, genotypes, etc). Then I used plotMDS function:
plotMDS(lcpm[,subsamples], top=500, col=df_annotation$col[subsamples], labels=NULL, dim = c(1,2))
plotMDS with dim=c(1,2) ,c(1,3), c(1,4) , c(2,3), c(2,4),or c(3,4) showed no obvious separations by those known factors.
I noticed that plotMDS uses the log transformed TMM normalized counts directly. As I could also get the mean-variance relationships from efit, why there is no such vst transformed expression data for the PCoA plot?
In the PlotMDS plot, if I chose gene.selection = "common"
, was the output identical as PCA plot with the same log transformed datasets? To my understanding, if Euclidean distance were applied, PCA and PCoA are identical, are they?
Thanks & regards,
Raymond
Thanks, Gordon. Based on your experience, when would you set 'gene.selection="common" ', and when ' gene.selection="pairwise" '? Is there any rule of thumb?
I use "pairwise" unless the number of samples is very large. With a large number of samples, "pairwise" is quadratically slow so I switch to "common".
Thanks, Gordon.