The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Is lmFit() the fastest way to fit thousands of linear model in R and use what algorithm make it so fast ?
0
gravatar for heyao
6 weeks ago by
heyao30
heyao30 wrote:

Hi all,

I just noticed that lmFit() is quite fast for linear model fit even for very big single cell data.

This make me curious that how fast does lmFit() achieve but I haven't found any benchmarking results about that.

If lmFit() is actually the fastest way for simple linear model fit,  I guess I would like to always use lmFit() instead of tidyr+purrr+lm() .

If lmFit() is fast because it uses some algorithms where can I find the implemention details ? Assuming I want to write a silimar function using other language like python or julia for my own use.
 

limma • 92 views
ADD COMMENTlink modified 6 weeks ago by James W. MacDonald49k • written 6 weeks ago by heyao30

I recently had RegParallel accepted to Bioconductor, which can fit independent models via lm(), glm(), bayesglm(), coxph(), et cetera, and parallelise this process (also 'nested' parallel processing is possible). It is not as quick as lmFit, though. I developed it while having to run conditional logistic regression models on old GWAS data for trios.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Kevin Blighe30
Answer: Is lmFit() the fastest way to fit thousands of linear model in R and use what a
0
gravatar for James W. MacDonald
6 weeks ago by
United States
James W. MacDonald49k wrote:

Yes, lmFit is super fast, mainly because it directly calls (and manipulates data from) lm.fit or lm.wfit, which call C code to do the work. I cannot imagine that some concatenation of tidyr and purr and lm would be even remotely close because that almost certainly involves fitting each model separately whereas lm.fit is fitting all the models simultaneously. But then I avoid the tidyverse like a plague, so maybe I am not informed enough to say whether or not some combination of tidyr and purr and lm would be reasonable.

You can find the implementation details in the source code for lmFit (which calls lm.series or glm.series or mrlm, depending on arguments), all of which end up calling either lm.fit or lm.wfit, with the right arguments depending on the model being fit. And inside of both lm.fit and lm.wfit are calls to the C code, which you can get by downloading the R sources.

ADD COMMENTlink written 6 weeks ago by James W. MacDonald49k

lm.fit is probably as fast as it can possibly be; the heavy lifting is done by the C code with LINPACK routines in a vectorized manner across all genes. Unconditional speed increases would probably require fiddling around with the compilation settings for BLAS or some other low-level stuff that would be beyond the ability of the average user.

limma's handling of weighted linear models could, in theory, be sped up by migrating the heavy lifting to C(++). Currently, lmFit calls lm.wfit in a R-level loop across all genes, and looping in R is generally a good candidate for optimization. However, this is not a priority - limma is plenty fast enough as it is, and frankly, we have better things to do.

ADD REPLYlink written 6 weeks ago by Aaron Lun22k

Thank you  James and Aaron, the answer and comments explain what I want to know clearly !

ADD REPLYlink written 6 weeks ago by heyao30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 201 users visited in the last hour