Question: Question about PCA and transformed data in DESeq2
1
gravatar for amandine.fournier@chu-lyon.fr
5.5 years ago by
Dear Michael, Simon, Wolfgang and others, I am a little bit confused about the count data transformations and the Principal Component Analysis in DESeq2. In the last vignette, the example on pages 18-19 shows a PCA plot of the samples, obtained with regularized log transformed data (rld). But in the plotPCA R documentation, it is written to use a SummarizedExperiment with transformed data produced by ?varianceStabilizingTransformation? (vst). This is quite discrepant, so I wonder which type of transformation I should use. Moreover, when applied to my real dataset (one group of 2 patients and another group of 2 control cases), I see the following : - when no transformation is applied, axis 1 = pathology (patients vs control cases) and axis 2 = unknown factor - when transformed with r-log (rld), axis 1 = unknown factor and axis 2 = pathology - when transformed with variance (vst), axis 1 = sex (girls vs boys), axis 2 = unknown factor So, I wonder if the data are driven by the pathology or by the sex of the subjects ? Is it incorrect to use untransformed data in PCA ? I don't really understand the usefulness of transforming the data since, as far as I understand, it is not used in DE analysis afterwards. Thank you in advance for your reply. Best regards, Amandine ----- Amandine Fournier Lyon Neuroscience Research Center and Lyon Civil Hospitals (France)
deseq2 • 746 views
ADD COMMENTlink modified 5.5 years ago by Michael Love23k • written 5.5 years ago by amandine.fournier@chu-lyon.fr80
Answer: Question about PCA and transformed data in DESeq2
0
gravatar for Michael Love
5.5 years ago by
Michael Love23k
United States
Michael Love23k wrote:
hi Amandine, On Oct 10, 2013 5:17 AM, <amandine.fournier@chu-lyon.fr> wrote: > > > Dear Michael, Simon, Wolfgang and others, > > I am a little bit confused about the count data transformations and the Principal Component Analysis in DESeq2. > > In the last vignette, the example on pages 18-19 shows a PCA plot of the samples, obtained with regularized log transformed data (rld). > But in the plotPCA R documentation, it is written to use a SummarizedExperiment with transformed data produced by ‘varianceStabilizingTransformation’ (vst). > This is quite discrepant, so I wonder which type of transformation I should use. Thanks for pointing this out. I will fix this plotPCA manual page. The function is written to use any SummarizedExperiment object, produced by either function. > > Moreover, when applied to my real dataset (one group of 2 patients and another group of 2 control cases), I see the following : > - when no transformation is applied, axis 1 = pathology (patients vs control cases) and axis 2 = unknown factor > - when transformed with r-log (rld), axis 1 = unknown factor and axis 2 = pathology > - when transformed with variance (vst), axis 1 = sex (girls vs boys), axis 2 = unknown factor > The order of the principal components can change with slight fluctuations in the data, so this is not necessary an indication of something wrong. If PC1 explains 30% of variance and PC2 explains 29%, it is easy for the order to swap. > So, I wonder if the data are driven by the pathology or by the sex of the subjects ? Is it incorrect to use untransformed data in PCA ? > I don't really understand the usefulness of transforming the data since, as far as I understand, it is not used in DE analysis afterwards. The usefulness is in order to examine the samples for outliers. With the untransformed counts, the variance is dominated by a few large counts. With log or shifted log, a lot of variance can come from low count genes. The transformations help to compare samples with priority on genes which are (hopefully) more biologically relevant and not due to technical artifact or "shot noise". Mike > > Thank you in advance for your reply. > Best regards, > Amandine > > ----- > Amandine Fournier > Lyon Neuroscience Research Center > and Lyon Civil Hospitals (France) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENTlink written 5.5 years ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 129 users visited in the last hour