Search
Question: Different PCA plots using rlog and vsd on the same data set
0
gravatar for lirongrossmann
8 weeks ago by
lirongrossmann0 wrote:

Hi All,

I have been using Deseq2 to analyze a dataset I have and ran into a problem I am not sure how to solve.

I have been using the following code to run deseq2 on my dataset:

 

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design = ~Risk)

dds <- estimateSizeFactors(dds)

rld <- rlog(dds)

plotPCA(rld, intgroup="Risk")

vsd <- varianceStabilizingTransformation(dds)

plotPCA(vsd, intgroup="Risk")

 

The two PCA plots I got look completely different, so I am not sure which transformation I should rely on for further analysis. 

Any help?

Thanks

 

ADD COMMENTlink modified 7 weeks ago • written 8 weeks ago by lirongrossmann0

 Thanks I have 2 groups that I want to compare in my dataset (of rna seq data) - one group contains 6 samples the other group contains 100 samples. 

When I run Deseq2 I get more than 1000 DE genes. But for  some reason when I plot pca using vsd and again using rlog I see different separation of the groups.

Interestingly, when I narrowed my analysis to 6 vs 6 the plots do look similar. 

Is it a known problem comparing highly unequal number of groups?

Thanks!

 

 

ADD REPLYlink written 7 weeks ago by lirongrossmann0

Try blind=FALSE. This is recommended in the vignette when there are many large differences 

ADD REPLYlink written 7 weeks ago by Michael Love15k

Thank you. I tried to use it with the top 30 genes and it didn't work. I was wondering if the highly unequal size of the two compared groups bias the pca and the clustering, because when I narrow down to equal size of groups I do see clear separation (with both vsd and rlog).

I would really like to upload the plots but I don't know to which URL I should upload it.

ADD REPLYlink written 7 weeks ago by lirongrossmann0
0
gravatar for Michael Love
8 weeks ago by
Michael Love15k
United States
Michael Love15k wrote:

Can you describe the data or the plots? How large of differences, experimental design, etc. There is some description of differences in the vignette.

ADD COMMENTlink written 8 weeks ago by Michael Love15k

Thanks! 

I will try it. 

ADD REPLYlink written 7 weeks ago by lirongrossmann0
0
gravatar for Wolfgang Huber
7 weeks ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Can you try with selecting the top 100 (or 200, 500) genes, by baseMean? Or also by rowVars of dds, vsd. When you apply PCA to all genes, the 'signal' may be dominated by precarious variations in the many genes with low counts.

Please also try posting the PCA plots.

ADD COMMENTlink written 7 weeks ago by Wolfgang Huber13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 133 users visited in the last hour