I am using limma
for a large analysis (>5,000 samples, >300 covariates) of single-channel microarray data without probe quality weights. I find that arrayWeights()
is unusably slow in this large-sample, large-coefficient setting (arrayWeights()
has been running for > 24 hours and is still going).
Therefore, I am interested in using arrayWeightsQuick()
as an alternative to arrayWeights()
. However, arrayWeightsQuick()
cannot be a drop-in replacement for arrayWeights()
because the function signature of arrayWeightsQuick()
requires the output of lmFit()
. I believe this means that arrayWeightsQuick()
is used at a different stage of the workflow relative to arrayWeights()
.
Here is my current workflow using arrayWeights()
:
weights <- arrayWeights(expression, design=design) # unusably slow!
fit <- lmFit(expression, design=design, weights=weights)
Is the following 2-step procedure a proper way to substitute arrayWeightsQuick()
for arrayWeights()
?
prefit <- lmFit(expression, design=design) # temporary
weights <- arrayWeightsQuick(expression, prefit)
fit <- lmFit(expression, design=design, weights=weights)
Relatedly: the user guide for limma
mentions arrayWeightsSimple()
, but no such function exists. Perhaps arrayWeightsSimple()
was renamed to arrayWeightsQuick()
in the code, but the documentation is not yet updated accordingly?
What sort of object is
expression
? Does it contain any missing values?In the specific use-case I have in mind,
expression
is a simplematrix
of log2 gene expression per sample, no missing values.Yes, your proposed approach is correct. Using arrayWeightsQuick() after an initial lmFit() is suitable for large datasets, providing faster computation compared to arrayWeights(). Ensure your expression matrix has no missing values for optimal performance.