R and High performance computing - Functions for Bioconductor users with large datasets

0

Entering edit mode

SPRINT ▴ 20

@sprint-5767

Last seen 9.6 years ago

Dear List Member, A brief announcement and request for help re. R and High Performance Computing. Over the past 3 years, the open-source SPRINT project has focussed on the parallelisation of R functions used by biostatisticians and other data analysts. On the basis of feedback from the biostatistical community in 2010, the SPRINT team at the University of Edinburgh developed the following 7 parallelised functions of generic utility in the analysis of large data matrices: papply(): an apply function, pboot(): a bootstrapping function, pcor(): a Pearson correlation function, pmaxT(): a permutation test function, ppam(): a clustering function (partioning around medoids), prandomForest(): a machine learning classifier function, pRP(): a rank product analysis function More information can be found here: www.r-sprint.org . The latest version of SPRINT (v1.0.4) is available from CRAN and includes, for the first time, the ability to run the software on Apple OS X. SPRINT has always been scalable from desktop to cluster to HPC facility, however, it can now take advantage of multi-core hardware in both Linux and OS X environments. SPRINT v1.0.4 software and documentation is available here: http://cran.r-project.org/web/packages/sprint/index.html A request for help? The SPRINT team would like to revise and refresh our understanding of the needs and requirements of the R community for High Performance computing. To achieve this, we've written a brief questionnaire (no more than 15 mins) which we hope will allow us to capture needs and prioritise SPRINT development over the next 18 months. It would be incredibly helpful if you could take a few moments to complete this questionnaire and tell us more about your R/HPC usage and/ or problems you may have with large and demanding data analyses. The questionnaire can be found at the link below and will be open until 4th March. After the questionnaire has closed, we'll analyse the data, make the results available (most likely via r-sprint.org) and prioritise our development of new functionality for SPRINT. https://www.survey.ed.ac.uk/2013_sprint/ Thanks very much in advance for your help with our requirements analysis! My apologies if you subsequently receive a similar request from us via mailing lists etc. to which you subscribe. If you have any queries regarding the use of SPRINT (including installation), please feel free to contact us at: sprint at ed.ac.uk All the best to everyone, Kevin Robertson ------------------------------------------------ SPRINT - A parallel framework for R sprint at ed.ac.uk www.r-sprint.org -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Clustering Clustering • 1.2k views

ADD COMMENT • link updated 11.2 years ago by shirley zhang ★ 1.0k • written 11.2 years ago by SPRINT ▴ 20

0

Entering edit mode

shirley zhang ★ 1.0k

@shirley-zhang-2038

Last seen 9.6 years ago

Dear Kevin, I really need a parallel implementation of both the apply() and lapply(). Thanks for your email for letting me know "sprint". In sprint manual, for papply(), it said data could be array, list or ff object. I just tried papply(0 for a list, but got an error: x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) > papply(x, mean) Error in papply(x, mean) : could not find function "is.ff" It runs fine with lapply(x, mean). Could you let me know whether I am missing anything? Thanks, Shirley On Thu, Feb 14, 2013 at 6:09 AM, SPRINT <sprint@ed.ac.uk> wrote: > Dear List Member, > A brief announcement and request for help re. R and High Performance > Computing. > > Over the past 3 years, the open-source SPRINT project has focussed on the > parallelisation of R functions used by biostatisticians and other data > analysts. > > On the basis of feedback from the biostatistical community in 2010, the > SPRINT team at the University of Edinburgh developed the following 7 > parallelised functions of generic utility in the analysis of large data > matrices: > > papply(): an apply function, pboot(): a bootstrapping function, pcor(): a > Pearson correlation function, pmaxT(): a permutation test function, ppam(): > a clustering function (partioning around medoids), prandomForest(): a > machine learning classifier function, pRP(): a rank product analysis > function > > More information can be found here: www.r-sprint.org . > > The latest version of SPRINT (v1.0.4) is available from CRAN and includes, > for the first time, the ability to run the software on Apple OS X. SPRINT > has always been scalable from desktop to cluster to HPC facility, however, > it can now take advantage of multi-core hardware in both Linux and OS X > environments. > > SPRINT v1.0.4 software and documentation is available here: > http://cran.r-project.org/web/packages/sprint/index.html > > A request for help > > The SPRINT team would like to revise and refresh our understanding of the > needs and requirements of the R community for High Performance computing. > To achieve this, we've written a brief questionnaire (no more than 15 mins) > which we hope will allow us to capture needs and prioritise SPRINT > development over the next 18 months. > > It would be incredibly helpful if you could take a few moments to complete > this questionnaire and tell us more about your R/HPC usage and/ or problems > you may have with large and demanding data analyses. > > The questionnaire can be found at the link below and will be open until > 4th March. > > After the questionnaire has closed, we'll analyse the data, make the > results available (most likely via r-sprint.org) and prioritise our > development of new functionality for SPRINT. > > https://www.survey.ed.ac.uk/2013_sprint/ > > Thanks very much in advance for your help with our requirements analysis! > My apologies if you subsequently receive a similar request from us via > mailing lists etc. to which you subscribe. > > If you have any queries regarding the use of SPRINT (including > installation), please feel free to contact us at: sprint@ed.ac.uk > > All the best to everyone, > > Kevin Robertson > > ------------------------------------------------ > SPRINT - A parallel framework for R > sprint@ed.ac.uk > www.r-sprint.org > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > <zhangxl@bu.edu> [[alternative HTML version deleted]]

ADD COMMENT • link 11.2 years ago shirley zhang ★ 1.0k

Login before adding your answer.