dream vignette says limma-voom repeated measures two-group comparison needs two duplicateCorrelation rounds which differs from limma user's guide and manual
3
0
Entering edit mode
hermidalc ▴ 10
@hermidalc-21769
Last seen 8 months ago
National Cancer Institute and Universit…

In the dream vignette it's described that for an RNA-seq repeated measure design the limma analysis needs two duplicateCorrelation rounds, though this differs from the limma user's guide and manual. Below for convenience is the design:

> metadata
Individual Disease Experiment Sex DiseaseSubtype
sample_01               ID1       0  sample_01   F              0
sample_02               ID1       0  sample_02   F              0
sample_03               ID2       0  sample_03   M              0
sample_04               ID2       0  sample_04   M              0
sample_05               ID3       0  sample_05   F              0
sample_06               ID3       0  sample_06   F              0
sample_07               ID4       0  sample_07   M              0
sample_08               ID4       0  sample_08   M              0
sample_09               ID5       0  sample_09   F              0
sample_10               ID5       0  sample_10   F              0
sample_11               ID6       0  sample_11   M              0
sample_12               ID6       0  sample_12   M              0
sample_13               ID7       1  sample_13   F              1
sample_14               ID7       1  sample_14   F              1
sample_15               ID8       1  sample_15   M              1
sample_16               ID8       1  sample_16   M              1
sample_17               ID9       1  sample_17   F              1
sample_18               ID9       1  sample_18   F              1
sample_19              ID10       1  sample_19   M              2
sample_20              ID10       1  sample_20   M              2
sample_21              ID11       1  sample_21   F              2
sample_22              ID11       1  sample_22   F              2
sample_23              ID12       1  sample_23   M              2
sample_24              ID12       1  sample_24   M              2


and the limma two-round duplicateCorrelation analysis code from the vignette, where the DE test is an ordinary two-group comparison ~Disease:

# apply duplicateCorrelation is two rounds
design = model.matrix( ~ Disease, metadata)
vobj_tmp = voom( geneExpr, design, plot=FALSE)
dupcor <- duplicateCorrelation(vobj_tmp,design,block=metadata$Individual) # run voom considering the duplicateCorrelation results # in order to compute more accurate precision weights # Otherwise, use the results from the first voom run vobj = voom( geneExpr, design, plot=FALSE, block=metadata$Individual, correlation=dupcor$consensus) # Estimate linear mixed model with a single variance component # Fit the model for each gene, dupcor <- duplicateCorrelation(vobj, design, block=metadata$Individual)

# But this step uses only the genome-wide average for the random effect
fitDupCor <- lmFit(vobj, design, block=metadata$Individual, correlation=dupcor$consensus)

# Fit Empirical Bayes for moderated t-statistics
fitDupCor <- eBayes( fitDupCor )


This differs from the limma user's guide which, for similar designs and DE test questions, describes only one duplicateCorrelation round. Also, the limma manual says that specifying parameters like block and correlation in voom only passes them to lmFit and voom doesn't use it for precision weights calculation. This would also make sense for only needing one round. Which way is the correct way?

I compared the vignette example with two rounds and the limma guide with one round and the p-values are not identical, the one-round has slightly higher p-values and slightly lower log-odds.

2
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States
1
Entering edit mode

Might be a good idea for limma team, when they have the time, to update the relevant areas in the limma user's guide with this new recommended double approach and also in the limma reference manual voom section regarding ... additional parameters, because it made it seem setting parameters like block and correlation would only get them passed to lmFit not actually being used in precision weights calc in voom.

1
Entering edit mode

I can't speak for the limma team and what they will or will not update, but you seem confused.

The fact that they are passed to lmFit, and hence use a linear mixed model instead of a conventional ANOVA is how the block and correlation arguments affect the estimation of precision weights. That's the whole point of the two-step approach (see the post by Bernd Klaus in the thread I reference above). So when the help page says

     ...: other arguments are passed to lmFit.



That's completely accurate. One could argue that there could be other information about why you might want to do the two-step thing when using a linear mixed model, but there's always the tension between adding all possible relevant information in the age of TL;DR, and being concise because people won't read things.

0
Entering edit mode

My apologies yes I found that phrase confusing, since I thought it meant that passing block and correlation options to voom would mean they weren't actually used there and simply passed to the subsequent call of lmFit(v, ...) with the voom result so you wouldn't need to specify block and correlation again. But I see in the examples you linked above you pass block and correlation to both voom and lmFit for the second round so voom is using them not simply passing them.

1
Entering edit mode

Yes. If you want lmFit to use a mixed model you have to supply a block and correlation argument. That's just how it works (e.g., that's how you tell lmFit that you want a mixed model).

The voom function is only used to estimate observation-level weights so you can use conventional linear models (which is what lmFit does), and the two-step procedure is simply done to update the weights to be more amenable to the mixed model rather than based on a fixed effects only model.

It's true that if you have values in the 'weights' component of an EList then lmFit will automatically fit a weighted linear model, so one might assume that paradigm extends to other things like fitting a mixed model because of something that voom might return.

However, what lmFit does is based on what getEAWP provides it, and there is documentation for that function (and the help page for the EList class, which is what voom returns), so there's no need to assume anything - you can just read those help pages along with the one for lmFit and know for sure what's up.

1
Entering edit mode

Leandro, it is a good start that you are reading the manual help page for voom, but you need to read the "Value" section of the page as well. You can see from this section that voom() only returns expression values and weights. It does not return blocks or correlations, so these could not possibly be passed on to any downstream functions. There is no way in R to pass on any information that is not stored in the output object.

0
Entering edit mode

Very sorry, I do usually read the return value description I simply forgot in this case. Thank you for the help!

1
Entering edit mode
@gabrielhoffman-8391
Last seen 5 months ago
United States

When I referred to two rounds of duplicateCorrelation(), I was referring to Gordon Smyth's suggestion rather than the original limma manual. The discrepancy is not ideal.

Since voom() uses lmFit() internally to get the residuals, specifying a block will be used to compute the residuals and thus the precision weights. The confusion is that following the call to voom() the user must call lmFit() directly. This call must explicitly specify block and correlation if desired; it will not inherit it from he previous voom() call as far as I can tell.

0
Entering edit mode

Thank you both James and Gabriel. It's much clearer now.

0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

James Macdonald's comments are on the mark but I think it may be helpful if I give a summary of the situation.

The limma Manual and User's Guide do not actually give any explicit advice about how to use duplicateCorrelation in conjunction with voom, except from carefully documenting the input arguments and output object from each of the functions. The documentation does not give any recommendation regarding how many times to run voom and duplicateCorrelation, so there can't be any discrepancy between the limma documentation and anything else.

We have however given advice on this support forum, and we have been giving the same advice for over six years:

The reason why the limma User's Guide gives examples with only one duplicateCorrelation round is because only one round is needed in the contexts covered by the examples. In the examples, no estimated weights are involved.

voom produces an EList data object with the components documented on the help page. You can read more information about EList objects by typing help("EList-class"). You can see that EList objects do not store correlations or blocks, so no information about correlations can be passed from voom to subsequent calls of lmFit.

It is clear from the structure of the functions that duplicateCorrelation has to be run before lmFit and the correlation stored. It is also clear that duplicateCorrelation can't use the weights from voom unless voom is run first. So the only question really is whether voom itself benefits from knowing the blocking structure, so does it need to be run once or twice? It doesn't make a huge difference whether voom is run once or twice, so both approaches are "correct". Originally we recommended one round, just for simplicity, but for the past 6 years we have been recommending two rounds as a compromise between safety and computational load.

0
Entering edit mode

Thank you very much, that makes complete sense. Apologies for not finding it when I searched.