Question: Different PCA plots using rlog and vsd on the same data set
gravatar for lirongrossmann
5 months ago by
lirongrossmann0 wrote:

Hi All,

I have been using Deseq2 to analyze a dataset I have and ran into a problem I am not sure how to solve.

I have been using the following code to run deseq2 on my dataset:


dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design = ~Risk)

dds <- estimateSizeFactors(dds)

rld <- rlog(dds)

plotPCA(rld, intgroup="Risk")

vsd <- varianceStabilizingTransformation(dds)

plotPCA(vsd, intgroup="Risk")


The two PCA plots I got look completely different, so I am not sure which transformation I should rely on for further analysis. 

Any help?



ADD COMMENTlink modified 4 months ago • written 5 months ago by lirongrossmann0

 Thanks I have 2 groups that I want to compare in my dataset (of rna seq data) - one group contains 6 samples the other group contains 100 samples. 

When I run Deseq2 I get more than 1000 DE genes. But for  some reason when I plot pca using vsd and again using rlog I see different separation of the groups.

Interestingly, when I narrowed my analysis to 6 vs 6 the plots do look similar. 

Is it a known problem comparing highly unequal number of groups?




ADD REPLYlink written 4 months ago by lirongrossmann0

Try blind=FALSE. This is recommended in the vignette when there are many large differences 

ADD REPLYlink written 4 months ago by Michael Love16k

Thank you. I tried to use it with the top 30 genes and it didn't work. I was wondering if the highly unequal size of the two compared groups bias the pca and the clustering, because when I narrow down to equal size of groups I do see clear separation (with both vsd and rlog).

I would really like to upload the plots but I don't know to which URL I should upload it.

ADD REPLYlink written 4 months ago by lirongrossmann0
gravatar for Michael Love
5 months ago by
Michael Love16k
United States
Michael Love16k wrote:

Can you describe the data or the plots? How large of differences, experimental design, etc. There is some description of differences in the vignette.

ADD COMMENTlink written 5 months ago by Michael Love16k


I will try it. 

ADD REPLYlink written 4 months ago by lirongrossmann0
gravatar for Wolfgang Huber
4 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Can you try with selecting the top 100 (or 200, 500) genes, by baseMean? Or also by rowVars of dds, vsd. When you apply PCA to all genes, the 'signal' may be dominated by precarious variations in the many genes with low counts.

Please also try posting the PCA plots.

ADD COMMENTlink written 4 months ago by Wolfgang Huber13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 264 users visited in the last hour