TPP2D - multicore computing for bootstrapNullAlternativeModel(...)
1
1
Entering edit mode
Tobias ▴ 40
@tobias-24288
Last seen 7 months ago
Switzerland

Hello bioc users,

does anyone know a way to apply multicore computing during the null model fitting step of the TPP2D package? I did a test run using only 2 iterations and it takes forever and runs on a single CPU.


> ### null model fitting
> fstat_df <- computeFStatFromParams(model_params_df)
> set.seed(12, kind = "L'Ecuyer-CMRG")
> ## next step is very sloooooooow
> ## short test B = 2
> null_model_B2 <- bootstrapNullAlternativeModel(df = preproc_df, params_df = model_params_df, B = 2)
[1] "Warning: You have specificed B < 20, it is recommended to use at least B = 20 in order to obtain reliable results."
  |===================================================================================================| 100%


> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TPP2D_1.10.0 dplyr_1.0.8 

loaded via a namespace (and not attached):
 [1] zip_2.2.0           Rcpp_1.0.8.3        pillar_1.7.0        compiler_4.1.2      bitops_1.0-7       
 [6] iterators_1.0.14    tools_4.1.2         lifecycle_1.0.1     tibble_3.1.6        gtable_0.3.0       
[11] lattice_0.20-45     pkgconfig_2.0.3     rlang_1.0.2         openxlsx_4.2.5      foreach_1.5.2      
[16] rstudioapi_0.13     DBI_1.1.2           cli_3.2.0           parallel_4.1.2      stringr_1.4.0      
[21] generics_0.1.2      vctrs_0.4.0         grid_4.1.2          tidyselect_1.1.2    glue_1.6.2         
[26] R6_2.5.1            fansi_1.0.3         BiocParallel_1.28.3 limma_3.50.1        tidyr_1.2.0        
[31] ggplot2_3.3.5       purrr_0.3.4         magrittr_2.0.3      scales_1.1.1        codetools_0.2-18   
[36] ellipsis_0.3.2      MASS_7.3-56         assertthat_0.2.1    colorspace_2.0-3    utf8_1.2.2         
[41] stringi_1.7.6       RCurl_1.98-1.6      munsell_0.5.0       doParallel_1.0.17   crayon_1.5.1

according to the function doc the BPPARAM parameter is:

BPPARAM BiocParallel parameter for optional parallelization of null distribution generation through bootstrapping, default: BiocParallel::SerialParam()

Executing that on my system gives:

> BiocParallel::SerialParam()
class: SerialParam
  bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE; bpforceGC: FALSE
  bplogdir: NA
  bpresultdir: NA

I guess this means somebody "prepared" the function to use the computing backend, but currently only single CPUs are used by default?

TPP2D • 488 views
ADD COMMENT
0
Entering edit mode

I found a way that worked on my MacBook Pro and also on a Linux (Debian 10):

##### multicore version #####
library(BiocParallel)
null_model_B20_mc <- bootstrapNullAlternativeModel(df = preproc_df, params_df = model_params_df, B = 20, BPPARAM = MulticoreParam())
ADD REPLY
3
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

You are already halfway to figuring this out for yourself. Here's how I would proceed.

The argument 'BPPARAM' says something about 'optional parallelization', and points to BiocParallel::SerialParam. Do note that this is a qualified function name, where the first part BiocParallel is the package from which the function SerialParam comes. Your next step should be to explore BiocParallel to see what other methods are available, say by reading the vignette, which tells you about the various methods for parallelizing, and in which cases each is applicable. You know what sort of computer you are using, and should then be able to decide which parallelization scheme is applicable.

ADD COMMENT

Login before adding your answer.

Traffic: 579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6