Question

PCA plot of variance stabilized transformation of normalized counts in limma

0

Entering edit mode

Raymond ▴ 20

@raymond-14020

Last seen 6.5 years ago

Hi,

I used to plot PCA using DESeq2, and it works great. In DESeq2, the normalized counts are transformed through vst(variance stabilized transformation, based on the NB variance ~ expectation relationships?) function, rather than the direct log transformed counts.

I tried limma-voom in my new datasets, which has more than 500 samples with various factors(sex, Batches, treatments, genotypes, etc). Then I used plotMDS function:

plotMDS(lcpm[,subsamples], top=500, 
        col=df_annotation$col[subsamples], 
        labels=NULL, dim = c(1,2))

plotMDS with dim=c(1,2) ,c(1,3), c(1,4) , c(2,3), c(2,4),or c(3,4) showed no obvious separations by those known factors.

I noticed that plotMDS uses the log transformed TMM normalized counts directly. As I could also get the mean-variance relationships from efit, why there is no such vst transformed expression data for the PCoA plot?

In the PlotMDS plot, if I chose gene.selection = "common", was the output identical as PCA plot with the same log transformed datasets? To my understanding, if Euclidean distance were applied, PCA and PCoA are identical, are they?

Thanks & regards,

Raymond

limma-voom deseq2 EdgeR • 2.8k views

ADD COMMENT • link updated 7.1 years ago by Steve Lianoglou ★ 13k • written 7.1 years ago by Raymond ▴ 20

score 0 · Answer 1 · 2018-10-18

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

Yes, gene.selection="common" will make the MDS distance equivalent to PCA.

To stabilize the variances for the MDS plot, use cpm() with prior.count=5.

ADD COMMENT • link 7.1 years ago Gordon Smyth 53k

0

Entering edit mode

Thanks, Gordon. Based on your experience, when would you set 'gene.selection="common" ', and when ' gene.selection="pairwise" '? Is there any rule of thumb?

ADD REPLY • link 7.1 years ago Raymond ▴ 20

0

Entering edit mode

I use "pairwise" unless the number of samples is very large. With a large number of samples, "pairwise" is quadratically slow so I switch to "common".