6 months ago by

Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia

The purpose of the workflow article *RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR* is to guide you through a practical analysis, not to provide mathematical formulas. From your remarks I assume that the workflow has been successful and that you have not encountered any problems.

If you want to understand the mathematics, both the workflow and the limma documentation provide references to articles where the mathematics in explained in detail. For example, typing `help("voom")`

brings up the help package for voom, which refers you to the Law et al (2014) paper, which is available from http://www.statsci.org/smyth/pubs/VoomPreprint.pdf The article gives an explicit mathematical formula for the x-axis of the voom plot.

The x-axis labels for the `voom`

and `plotSA`

plots seem to me to be informative and appropriate. The labels are different because the quantities are different. The x-axis for the voom plot says log(count size) because that's what it is, not log CPM. You can see that the range of x-values for the two plots are not the same, so they are obviously different quantities. The difference between the two plots corresponds to the two methods *voom* and *limma-trend* that are compared in the Law et al (2014) article. Code for the two methods are given in Chapter 15 of the limma User's Guide. The *voom* method applies different precision weights to counts of different sizes even when the logCPM values might be the same. This allows it to downweight counts from smaller libraries. So it has to work on count sizes and not CPM.

The difference between *voom* and *limma-trend* is somewhat subtle. Not all the subtlety can be explained in the *1-2-3* workflow, but I wish you wouldn't be quite so ready to assume we have carelessly labelled the same quantity differently in different plots and that you would need to rewrite our package for us.

In a standard voom analysis the `plotSA`

would not be used at all. `voom`

already incorporates the mean-variance trend into the weights, meaning that there is nothing to see in the SA plot. The SA plot was included in the *1-2-3* workflow just for extra exposition, to demonstrate that the trend is gone. You don't need to keep demonstrating that for every new analysis you might do. If you need to teach voom to a beginner audience, I would suggest omitting the SA plot entirely.

The quantity log(CPM + 0.5) is not used anywhere in the limma package. You can read the correct formula in the article above and and it is briefly indicated on the voom help page also, which tells you that the offset of 0.5 is applied to the counts rather than to the CPMs.

`lmFit`

, `eBayes`

and `plotSA`

are core limma functions that are not specific to `voom`

output or to RNA-seq analysis. The x-axis of the `plotSA`

plot is the same as the `AveExpr`

column returned by `topTable`

and it is documented by `help("topTable")`

or by `help("MArrayLM-class")`

. It is also documented in the limma User's Guide. It is simply the row-wise averages of the log-expression values input to `lmFit`

. In the case of a voom analysis, it is indeed simply the average of the logCPM values as returned by `voom`

.

You've included a code line

`lines(predict(loess ...`

but without showing the plot line produced by it. Could you please either delete the code or else show the output. The output isn't necessary for your question, so I suggest deleting.39kThanks. This plot was saved prior to the loess line.

0