Question: MDS plotting within the limma/voom workflow
0
22 days ago by
Ben0
United States
Ben0 wrote:

Hi!

This is a question regarding the limma/voom workflow for analyzing RNA-Seq dataset.

According to the workflow described in "RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR", MDS plotting is performed on the DGEList object which was passed through the filterByExpr and calcNormFactors functions. It is performed before executing the voom function.

If I understood correctly, voom is removing heteroscedascity from the count data.

In the DESeq2 vignette ("RNA-seq workﬂow: gene-level exploratory analysis and differential expression") on the other hand, MDS plotting is performed on the vst/rlog transformed data, which both remove heteroscedascity too. Here it is mentioned that MDS plotting requires removing heteroscedascity.

So for me, right now, it looks like the limma/voom workflow is in contrast to this statement, since voom - which removes heteroscedascity - is performed after the MDS plotting.

I hope someone can explain why the two workflows seem to differ here.

Thanks much!

limma deseq2 voom • 97 views
modified 22 days ago by Gordon Smyth37k • written 22 days ago by Ben0
Answer: MDS plotting within the limma/voom workflow
1
22 days ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

In the RNA-seq analysis is easy as 1-2-3 workflow, plotMDS is applied to logCPM values that have been computed using edgeR's cpm function with a large prior count. Here the mean-variance relationship inherent in the original counts has been stabilized first by logging and secondly by the large prior count, which damps down the variability of logCPMs for small counts. The workflow explains this in some detail. You can see from this workflow and from the edgeR workflow that the plot works very well indeed.

BTW, I think you may be conflating some different ideas. First of all, voom and vst are designed to estimate mean-variance relationships, not to remove heteroscedasticity. Different genes are still allowed to have different variances (i.e., to be heteroscedastic), although gene to gene heteroscedasticity will be reduced to the extent that it depends on abundance.

Second, voom does not remove the mean-variance relationship, it merely estimates the trend and accounts for it in the differential expression calculations.

voom examines variability of genuine replicate data after the data has been adjusted for treatment conditions. It is not conflate systematic treatment effects with residual variability.

The purpose of the MDS plot is to explore the data before any models are fitted. The idea is to plot something as close to the raw data as is possible, although normalization for library sizes is necessary. IMO it is not desirable to transform the data with a model-based method prior to plotting.

I do understand that MDS plots should be based on "close-to-raw" datasets, meaning before any model fitting is performed. This does not seem to be in contrast to performing MDS plots either on vst or voom transformed dataset, since both of these functions are applied prior to the model fitting step.

Regarding voom and the fact that this function is not removing heteroscedasticity: I was assuming that voom is removing heteroscedasticity since the title of the paragraph where voom is applied says exactly this ("Removing heteroscedascity from count data", from: 1-2-3 limma-voom workflow). But this clarification is helpful! Thanks!

So at the end, and please correct me if I am wrong, voom or vst are not removing but minimizing heteroscedasticity, and the difference between the workflow of DESeq2 and limma/voom regarding the execution of vst/voom prior to MDS/PCA plotting is based on the fact that variance stabilization is done within the cpm function.

Please let me know if my "take-home-message" is correct.

Thanks!

Yes, the cpm and the vst functions are both transforming the counts to a log scale suitable for a PCA or MDS plot. They have the same aim.

In both cases, these transformations are independent of the DE analysis. The output of cpm/vst is not used for the DE analysis, nor is the output of the DE functions like voom used for the MDS plot. So the MDS plot will be the same whether it is done before or after the DE analysis.

I take your point about the title of Section 6.2 of the 1-2-3 workflow. I guess what was meant is that voom removes the mean-variance trend from the SA plot in Figure 4. Also the voom precision weights allow limma to assume that the unknown variances are equal across samples for each gene. It does not mean though that voom produces a new version of the data with all the signal preserved and all heteroscedasticity removed.

Ok. Great. Thanks for your final comment. I think that clarifies the differences between the limma/voom and DESeq2 workflow regarding the MDS plotting.

Thanks much!