Different PCA plots using rlog and vsd on the same data set
2
0
Entering edit mode
@lirongrossmann-13938
Last seen 4.2 years ago

Hi All,

I have been using Deseq2 to analyze a dataset I have and ran into a problem I am not sure how to solve.

I have been using the following code to run deseq2 on my dataset:

 

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design = ~Risk)

dds <- estimateSizeFactors(dds)

rld <- rlog(dds)

plotPCA(rld, intgroup="Risk")

vsd <- varianceStabilizingTransformation(dds)

plotPCA(vsd, intgroup="Risk")

 

The two PCA plots I got look completely different, so I am not sure which transformation I should rely on for further analysis. 

Any help?

Thanks

 

deseq2 rlog variancestabilizingtransformation plotpca pca • 3.7k views
ADD COMMENT
0
Entering edit mode

 Thanks I have 2 groups that I want to compare in my dataset (of rna seq data) - one group contains 6 samples the other group contains 100 samples. 

When I run Deseq2 I get more than 1000 DE genes. But for  some reason when I plot pca using vsd and again using rlog I see different separation of the groups.

Interestingly, when I narrowed my analysis to 6 vs 6 the plots do look similar. 

Is it a known problem comparing highly unequal number of groups?

Thanks!

 

 

ADD REPLY
0
Entering edit mode

Try blind=FALSE. This is recommended in the vignette when there are many large differences 

ADD REPLY
0
Entering edit mode

Thank you. I tried to use it with the top 30 genes and it didn't work. I was wondering if the highly unequal size of the two compared groups bias the pca and the clustering, because when I narrow down to equal size of groups I do see clear separation (with both vsd and rlog).

I would really like to upload the plots but I don't know to which URL I should upload it.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

Can you describe the data or the plots? How large of differences, experimental design, etc. There is some description of differences in the vignette.

ADD COMMENT
0
Entering edit mode

Thanks! 

I will try it. 

ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…

Can you try with selecting the top 100 (or 200, 500) genes, by baseMean? Or also by rowVars of dds, vsd. When you apply PCA to all genes, the 'signal' may be dominated by precarious variations in the many genes with low counts.

Please also try posting the PCA plots.

ADD COMMENT

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6