Question

Different PCA plots using rlog and vsd on the same data set

0

Entering edit mode

lirongrossmann ▴ 80

@lirongrossmann-13938

Last seen 5.3 years ago

Hi All,

I have been using Deseq2 to analyze a dataset I have and ran into a problem I am not sure how to solve.

I have been using the following code to run deseq2 on my dataset:

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design = ~Risk)

dds <- estimateSizeFactors(dds)

rld <- rlog(dds)

plotPCA(rld, intgroup="Risk")

vsd <- varianceStabilizingTransformation(dds)

plotPCA(vsd, intgroup="Risk")

The two PCA plots I got look completely different, so I am not sure which transformation I should rely on for further analysis.

Any help?

Thanks

deseq2 rlog variancestabilizingtransformation plotpca pca • 4.3k views

ADD COMMENT • link 8.4 years ago lirongrossmann ▴ 80

0

Entering edit mode

Thanks I have 2 groups that I want to compare in my dataset (of rna seq data) - one group contains 6 samples the other group contains 100 samples.

When I run Deseq2 I get more than 1000 DE genes. But for some reason when I plot pca using vsd and again using rlog I see different separation of the groups.

Interestingly, when I narrowed my analysis to 6 vs 6 the plots do look similar.

Is it a known problem comparing highly unequal number of groups?

Thanks!

ADD REPLY • link 8.4 years ago lirongrossmann ▴ 80

0

Entering edit mode

Try blind=FALSE. This is recommended in the vignette when there are many large differences

ADD REPLY • link 8.4 years ago Michael Love 43k

0

Entering edit mode

Thank you. I tried to use it with the top 30 genes and it didn't work. I was wondering if the highly unequal size of the two compared groups bias the pca and the clustering, because when I narrow down to equal size of groups I do see clear separation (with both vsd and rlog).

I would really like to upload the plots but I don't know to which URL I should upload it.

ADD REPLY • link 8.4 years ago lirongrossmann ▴ 80

score 0 · Answer 1 · 2017-09-24

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

Can you describe the data or the plots? How large of differences, experimental design, etc. There is some description of differences in the vignette.

ADD COMMENT • link 8.4 years ago Michael Love 43k

0

Entering edit mode

Thanks!

I will try it.

ADD REPLY • link 8.4 years ago lirongrossmann ▴ 80

score 0 · Answer 2 · 2017-09-28

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 5 months ago

EMBL European Molecular Biology Laborat…

Can you try with selecting the top 100 (or 200, 500) genes, by baseMean? Or also by rowVars of dds, vsd. When you apply PCA to all genes, the 'signal' may be dominated by precarious variations in the many genes with low counts.

Please also try posting the PCA plots.

ADD COMMENT • link 8.4 years ago Wolfgang Huber ★ 13k