Question

limma voom Mean-variance trend plot: x-axis?

2

Entering edit mode

AS ▴ 70

@as-20608

Last seen 5.0 years ago

Hi all,

I've been following the "RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR" vignette for usign limma on miRNA-seq data. I've gotten to the point of comparing the mean-variances pre- and post-filtering my data based on CPM counts.

I'm a bit bugged by the differences in x-axis labels between the plot generated by voom() and plotSA() and I'm trying to understand if they are the same thing. The y-axis at least was easy--sqrt(sigma) is the same as sqrt(standard deviation).

As far as I can tell from the manuals, the voom plot x-axis: log2( count size + 0.5) is based on log2(CPM +0.5), with the 0.5 being the offset to prevent taking log(0). That's fine. However, I assume what's plotted is the average of log2(CPM+0.5) over all of the samples? This isn't explicitly stated anywhere.

Similarly, I think the plotSA x-axis: Average log-expression is based on the $Amean values (e2fit$Amean in my code) which were generated from lmFit(), but I don't totally understand how they were calculated. Is it the average of log2(CPM)? was that 0.5 offset also included?

My goal is to relabel the x-axes to match if they actually represent the same thing so I can avoid confusion about why I'm comparing plots with 2 different things. If not, I want to understand how they're different so I can explain why they are still comparable. Thanks in advance.

par(mfrow=c(1,2))
v2 <- voom(y2, design, plot=TRUE)

v2fit <- lmFit(v2, design)
e2fit <- eBayes(v2fit)
plotSA(e2fit, main="Final model: Mean-variance trend", ylab = "Sqrt( standard deviation )")

plots

limma • 4.7k views

ADD COMMENT • link 5.0 years ago AS ▴ 70

2

Entering edit mode

You've included a code line lines(predict(loess ... but without showing the plot line produced by it. Could you please either delete the code or else show the output. The output isn't necessary for your question, so I suggest deleting.

ADD REPLY • link 5.0 years ago Gordon Smyth 50k

2

Entering edit mode

Thanks. This plot was saved prior to the loess line.

ADD REPLY • link 5.0 years ago AS ▴ 70

score 5 · Answer 1 · 2019-04-24

The purpose of the workflow article RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR is to guide you through a practical analysis, not to provide mathematical formulas. From your remarks I assume that the workflow has been successful and that you have not encountered any problems.

If you want to understand the mathematics, both the workflow and the limma documentation provide references to articles where the mathematics in explained in detail. For example, typing help("voom") brings up the help package for voom, which refers you to the Law et al (2014) paper, which is available from http://www.statsci.org/smyth/pubs/VoomPreprint.pdf The article gives an explicit mathematical formula for the x-axis of the voom plot.

The x-axis labels for the voom and plotSA plots seem to me to be informative and appropriate. The labels are different because the quantities are different. The x-axis for the voom plot says log(count size) because that's what it is, not log CPM. You can see that the range of x-values for the two plots are not the same, so they are obviously different quantities. The difference between the two plots corresponds to the two methods voom and limma-trend that are compared in the Law et al (2014) article. Code for the two methods are given in Chapter 15 of the limma User's Guide. The voom method applies different precision weights to counts of different sizes even when the logCPM values might be the same. This allows it to downweight counts from smaller libraries. So it has to work on count sizes and not CPM.

The difference between voom and limma-trend is somewhat subtle. Not all the subtlety can be explained in the 1-2-3 workflow, but I wish you wouldn't be quite so ready to assume we have carelessly labelled the same quantity differently in different plots and that you would need to rewrite our package for us.

In a standard voom analysis the plotSA would not be used at all. voom already incorporates the mean-variance trend into the weights, meaning that there is nothing to see in the SA plot. The SA plot was included in the 1-2-3 workflow just for extra exposition, to demonstrate that the trend is gone. You don't need to keep demonstrating that for every new analysis you might do. If you need to teach voom to a beginner audience, I would suggest omitting the SA plot entirely.

The quantity log(CPM + 0.5) is not used anywhere in the limma package. You can read the correct formula in the article above and and it is briefly indicated on the voom help page also, which tells you that the offset of 0.5 is applied to the counts rather than to the CPMs.

lmFit, eBayes and plotSA are core limma functions that are not specific to voom output or to RNA-seq analysis. The x-axis of the plotSA plot is the same as the AveExpr column returned by topTable and it is documented by help("topTable") or by help("MArrayLM-class"). It is also documented in the limma User's Guide. It is simply the row-wise averages of the log-expression values input to lmFit. In the case of a voom analysis, it is indeed simply the average of the logCPM values as returned by voom.