Search
Question: PCA plot of variance stabilized transformation of normalized counts in limma
0
4 weeks ago by
Raymond0
Raymond0 wrote:

Hi,

I used to plot PCA  using DESeq2, and it works great.  In DESeq2, the normalized counts are transformed through vst(variance stabilized transformation, based on the NB variance ~ expectation relationships?) function, rather than the direct log transformed counts.

I tried limma-voom in my new datasets, which has more than 500 samples with various factors(sex, Batches, treatments, genotypes, etc).  Then I used plotMDS function:

plotMDS(lcpm[,subsamples], top=500,
col=df_annotation\$col[subsamples],
labels=NULL, dim = c(1,2))


plotMDS with dim=c(1,2) ,c(1,3), c(1,4) , c(2,3), c(2,4),or c(3,4) showed no obvious separations by those known factors.

I noticed that plotMDS uses the log transformed TMM normalized counts directly.  As I could also get the mean-variance relationships from efit, why there is no such vst transformed expression data for the PCoA plot?

In the PlotMDS plot, if I chose gene.selection = "common"was the output identical as PCA plot with the same log transformed datasets?  To my understanding, if Euclidean distance were applied, PCA and PCoA are identical, are they?

Thanks & regards,

Raymond

modified 4 weeks ago by Steve Lianoglou12k • written 4 weeks ago by Raymond0
0
4 weeks ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

Yes, gene.selection="common" will make the MDS distance equivalent to PCA.

To stabilize the variances for the MDS plot, use cpm() with prior.count=5.

Thanks, Gordon. Based on your experience, when would you set 'gene.selection="common" ', and when ' gene.selection="pairwise" '? Is there any rule of thumb?

I use "pairwise" unless the number of samples is very large. With a large number of samples, "pairwise" is quadratically slow so I switch to "common".