Question: Is lmFit() the fastest way to fit thousands of linear model in R and use what algorithm make it so fast ?
gravatar for heyao
9 months ago by
heyao30 wrote:

Hi all,

I just noticed that lmFit() is quite fast for linear model fit even for very big single cell data.

This make me curious that how fast does lmFit() achieve but I haven't found any benchmarking results about that.

If lmFit() is actually the fastest way for simple linear model fit,  I guess I would like to always use lmFit() instead of tidyr+purrr+lm() .

If lmFit() is fast because it uses some algorithms where can I find the implemention details ? Assuming I want to write a silimar function using other language like python or julia for my own use.

limma • 356 views
ADD COMMENTlink modified 9 months ago by James W. MacDonald51k • written 9 months ago by heyao30

I recently had RegParallel accepted to Bioconductor, which can fit independent models via lm(), glm(), bayesglm(), coxph(), et cetera, and parallelise this process (also 'nested' parallel processing is possible). It is not as quick as lmFit, though. I developed it while having to run conditional logistic regression models on old GWAS data for trios.

ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe200
Answer: Is lmFit() the fastest way to fit thousands of linear model in R and use what a
gravatar for James W. MacDonald
9 months ago by
United States
James W. MacDonald51k wrote:

Yes, lmFit is super fast, mainly because it directly calls (and manipulates data from) or lm.wfit, which call C code to do the work. I cannot imagine that some concatenation of tidyr and purr and lm would be even remotely close because that almost certainly involves fitting each model separately whereas is fitting all the models simultaneously. But then I avoid the tidyverse like a plague, so maybe I am not informed enough to say whether or not some combination of tidyr and purr and lm would be reasonable.

You can find the implementation details in the source code for lmFit (which calls lm.series or glm.series or mrlm, depending on arguments), all of which end up calling either or lm.wfit, with the right arguments depending on the model being fit. And inside of both and lm.wfit are calls to the C code, which you can get by downloading the R sources.

ADD COMMENTlink written 9 months ago by James W. MacDonald51k is probably as fast as it can possibly be; the heavy lifting is done by the C code with LINPACK routines in a vectorized manner across all genes. Unconditional speed increases would probably require fiddling around with the compilation settings for BLAS or some other low-level stuff that would be beyond the ability of the average user.

limma's handling of weighted linear models could, in theory, be sped up by migrating the heavy lifting to C(++). Currently, lmFit calls lm.wfit in a R-level loop across all genes, and looping in R is generally a good candidate for optimization. However, this is not a priority - limma is plenty fast enough as it is, and frankly, we have better things to do.

ADD REPLYlink written 9 months ago by Aaron Lun25k

Thank you  James and Aaron, the answer and comments explain what I want to know clearly !

ADD REPLYlink written 9 months ago by heyao30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 303 users visited in the last hour