4 weeks ago by
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
The purpose of the workflow article RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR is to guide you through a practical analysis, not to provide mathematical formulas. From your remarks I assume that the workflow has been successful and that you have not encountered any problems.
If you want to understand the mathematics, both the workflow and the limma documentation provide references to articles where the mathematics in explained in detail. For example, typing
help("voom") brings up the help package for voom, which refers you to the Law et al (2014) paper, which is available from http://www.statsci.org/smyth/pubs/VoomPreprint.pdf The article gives an explicit mathematical formula for the x-axis of the voom plot.
The x-axis labels for the
plotSA plots seem to me to be informative and appropriate. The labels are different because the quantities are different. The x-axis for the voom plot says log(count size) because that's what it is, not log CPM. You can see that the range of x-values for the two plots are not the same, so they are obviously different quantities. The difference between the two plots corresponds to the two methods voom and limma-trend that are compared in the Law et al (2014) article. Code for the two methods are given in Chapter 15 of the limma User's Guide. The voom method applies different precision weights to counts of different sizes even when the logCPM values might be the same. This allows it to downweight counts from smaller libraries. So it has to work on count sizes and not CPM.
The difference between voom and limma-trend is somewhat subtle. Not all the subtlety can be explained in the 1-2-3 workflow, but I wish you wouldn't be quite so ready to assume we have carelessly labelled the same quantity differently in different plots and that you would need to rewrite our package for us.
In a standard voom analysis the
plotSA would not be used at all.
voom already incorporates the mean-variance trend into the weights, meaning that there is nothing to see in the SA plot. The SA plot was included in the 1-2-3 workflow just for extra exposition, to demonstrate that the trend is gone. You don't need to keep demonstrating that for every new analysis you might do. If you need to teach voom to a beginner audience, I would suggest omitting the SA plot entirely.
The quantity log(CPM + 0.5) is not used anywhere in the limma package. You can read the correct formula in the article above and and it is briefly indicated on the voom help page also, which tells you that the offset of 0.5 is applied to the counts rather than to the CPMs.
plotSA are core limma functions that are not specific to
voom output or to RNA-seq analysis. The x-axis of the
plotSA plot is the same as the
AveExpr column returned by
topTable and it is documented by
help("topTable") or by
help("MArrayLM-class"). It is also documented in the limma User's Guide. It is simply the row-wise averages of the log-expression values input to
lmFit. In the case of a voom analysis, it is indeed simply the average of the logCPM values as returned by