Is lmFit() the fastest way to fit thousands of linear model in R and use what algorithm make it so fast ?
1
0
Entering edit mode
heyao ▴ 30
@heyao-14543
Last seen 4.1 years ago

Hi all,

I just noticed that lmFit() is quite fast for linear model fit even for very big single cell data.

This make me curious that how fast does lmFit() achieve but I haven't found any benchmarking results about that.

If lmFit() is actually the fastest way for simple linear model fit,  I guess I would like to always use lmFit() instead of tidyr+purrr+lm() .

If lmFit() is fast because it uses some algorithms where can I find the implemention details ? Assuming I want to write a silimar function using other language like python or julia for my own use.
 

limma • 3.6k views
ADD COMMENT
0
Entering edit mode

I recently had RegParallel accepted to Bioconductor, which can fit independent models via lm(), glm(), bayesglm(), coxph(), et cetera, and parallelise this process (also 'nested' parallel processing is possible). It is not as quick as lmFit, though. I developed it while having to run conditional logistic regression models on old GWAS data for trios.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

Yes, lmFit is super fast, mainly because it directly calls (and manipulates data from) lm.fit or lm.wfit, which call C code to do the work. I cannot imagine that some concatenation of tidyr and purr and lm would be even remotely close because that almost certainly involves fitting each model separately whereas lm.fit is fitting all the models simultaneously. But then I avoid the tidyverse like a plague, so maybe I am not informed enough to say whether or not some combination of tidyr and purr and lm would be reasonable.

You can find the implementation details in the source code for lmFit (which calls lm.series or glm.series or mrlm, depending on arguments), all of which end up calling either lm.fit or lm.wfit, with the right arguments depending on the model being fit. And inside of both lm.fit and lm.wfit are calls to the C code, which you can get by downloading the R sources.

ADD COMMENT
0
Entering edit mode

lm.fit is probably as fast as it can possibly be; the heavy lifting is done by the C code with LINPACK routines in a vectorized manner across all genes. Unconditional speed increases would probably require fiddling around with the compilation settings for BLAS or some other low-level stuff that would be beyond the ability of the average user.

limma's handling of weighted linear models could, in theory, be sped up by migrating the heavy lifting to C(++). Currently, lmFit calls lm.wfit in a R-level loop across all genes, and looping in R is generally a good candidate for optimization. However, this is not a priority - limma is plenty fast enough as it is, and frankly, we have better things to do.

ADD REPLY
0
Entering edit mode

Thank you  James and Aaron, the answer and comments explain what I want to know clearly !

ADD REPLY

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6