Question: limma voom Mean-variance trend plot: x-axis?
0
6 months ago by
AS0
AS0 wrote:

Hi all,

I've been following the "RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR" vignette for usign limma on miRNA-seq data. I've gotten to the point of comparing the mean-variances pre- and post-filtering my data based on CPM counts.

I'm a bit bugged by the differences in x-axis labels between the plot generated by voom() and plotSA() and I'm trying to understand if they are the same thing. The y-axis at least was easy--sqrt(sigma) is the same as sqrt(standard deviation).

As far as I can tell from the manuals, the voom plot x-axis: log2( count size + 0.5) is based on log2(CPM +0.5), with the 0.5 being the offset to prevent taking log(0). That's fine. However, I assume what's plotted is the average of log2(CPM+0.5) over all of the samples? This isn't explicitly stated anywhere.

Similarly, I think the plotSA x-axis: Average log-expression is based on the $Amean values (e2fit$Amean in my code) which were generated from lmFit(), but I don't totally understand how they were calculated. Is it the average of log2(CPM)? was that 0.5 offset also included?

My goal is to relabel the x-axes to match if they actually represent the same thing so I can avoid confusion about why I'm comparing plots with 2 different things. If not, I want to understand how they're different so I can explain why they are still comparable. Thanks in advance.

par(mfrow=c(1,2))
v2 <- voom(y2, design, plot=TRUE)

v2fit <- lmFit(v2, design)
e2fit <- eBayes(v2fit)
plotSA(e2fit, main="Final model: Mean-variance trend", ylab = "Sqrt( standard deviation )")


limma • 408 views
modified 6 months ago • written 6 months ago by AS0

You've included a code line lines(predict(loess ... but without showing the plot line produced by it. Could you please either delete the code or else show the output. The output isn't necessary for your question, so I suggest deleting.

Thanks. This plot was saved prior to the loess line.

Answer: limma voom Mean-variance trend plot: x-axis?
1
6 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

The purpose of the workflow article RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR is to guide you through a practical analysis, not to provide mathematical formulas. From your remarks I assume that the workflow has been successful and that you have not encountered any problems.

If you want to understand the mathematics, both the workflow and the limma documentation provide references to articles where the mathematics in explained in detail. For example, typing help("voom") brings up the help package for voom, which refers you to the Law et al (2014) paper, which is available from http://www.statsci.org/smyth/pubs/VoomPreprint.pdf The article gives an explicit mathematical formula for the x-axis of the voom plot.

The x-axis labels for the voom and plotSA plots seem to me to be informative and appropriate. The labels are different because the quantities are different. The x-axis for the voom plot says log(count size) because that's what it is, not log CPM. You can see that the range of x-values for the two plots are not the same, so they are obviously different quantities. The difference between the two plots corresponds to the two methods voom and limma-trend that are compared in the Law et al (2014) article. Code for the two methods are given in Chapter 15 of the limma User's Guide. The voom method applies different precision weights to counts of different sizes even when the logCPM values might be the same. This allows it to downweight counts from smaller libraries. So it has to work on count sizes and not CPM.

The difference between voom and limma-trend is somewhat subtle. Not all the subtlety can be explained in the 1-2-3 workflow, but I wish you wouldn't be quite so ready to assume we have carelessly labelled the same quantity differently in different plots and that you would need to rewrite our package for us.

In a standard voom analysis the plotSA would not be used at all. voom already incorporates the mean-variance trend into the weights, meaning that there is nothing to see in the SA plot. The SA plot was included in the 1-2-3 workflow just for extra exposition, to demonstrate that the trend is gone. You don't need to keep demonstrating that for every new analysis you might do. If you need to teach voom to a beginner audience, I would suggest omitting the SA plot entirely.

The quantity log(CPM + 0.5) is not used anywhere in the limma package. You can read the correct formula in the article above and and it is briefly indicated on the voom help page also, which tells you that the offset of 0.5 is applied to the counts rather than to the CPMs.

lmFit, eBayes and plotSA are core limma functions that are not specific to voom output or to RNA-seq analysis. The x-axis of the plotSA plot is the same as the AveExpr column returned by topTable and it is documented by help("topTable") or by help("MArrayLM-class"). It is also documented in the limma User's Guide. It is simply the row-wise averages of the log-expression values input to lmFit. In the case of a voom analysis, it is indeed simply the average of the logCPM values as returned by voom.

Sorry for the offense--it was completely unintentional. I've just started to work in this sphere so I'm still trying to get a handle on code, math, and statistics. Given the sqrt(sigma) y-axis label on the plotSA graph, I just wasn't sure if it was just a different way expressing the same thing. That being said, it's my fault for forgetting there are published literature for each of these packages that goes into the detailed math, besides just the help code, vignettes, and user guides.

Based on what you said, I think I misunderstood the point of that exposition then. I thought it was suggesting that in an analysis, it's worth comparing the 2 plots to check that the variance trend had disappeared/decreased as proof of principle, but I think you're saying that's unnecessary. I was considering it in the context of explaining my results to collaborators/PIs who have no experience with these packages and their output, who would inevitably ask questions like why are the axes different and what does it mean--which I wasn't prepared to answer.

And yes, the workflow has been successful. The results themselves are less exciting, but for that reason I wanted to be able to show the variance, density, and MDS plots to explain that I think the lack of exciting significant results has more to do with sample size than problems with the analysis itself.

In any case, thanks for the detailed answer! You guys are really good at keeping tabs on questions.