oligo and parallel computing (on Windows)?
Guido Hooiveld ★ 4.1k
Last seen 2 days ago
Wageningen University, Wageningen, the …

I am about to get a large microarray data set which I would like to analyze using oligo. To run the analyses as efficiently as possible, I am exploring the parallelization options mentioned in the oligo user guide (section 7.3; Parallel Computing on Multicore Machines, pages 46-47). See also code below.

However, I don't see any improvement when setting the variable R_THREADS to a higher number (by doing so running times for this example were reduced with 50% in the user guide) , so I wonder whether that approach is (still) applicable to R on a Windows machine. Any advice would be appreciated.

NB: for me this feature is in the category "nice to have" rather than "need to have". :)


sample code (from user guide); I obtained same results (i.e. same elapsed times) when using my own (larger) data set.

> library(oligo)
> library(pd.huex.1.0.st.v2)
> library(oligoData)
> data(affyExonFS)
> t0 <- system.time(res0 <- rma(affyExonFS))
Background correcting
Calculating Expression
> Sys.setenv(R_THREADS=4)
> t1 <- system.time(res1 <- rma(affyExonFS))
Background correcting
Calculating Expression
> all.equal(res0, res1)
[1] TRUE
> t0
   user  system elapsed
  19.65    0.56   20.21
> t1
   user  system elapsed
  19.78    0.53   20.62

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] oligoData_1.8.0          pd.huex.1.0.st.v2_3.14.1 DBI_0.7                 
 [4] RSQLite_2.0              oligo_1.40.2             Biostrings_2.44.2       
 [7] XVector_0.16.0           IRanges_2.10.2           S4Vectors_0.14.3        
[10] Biobase_2.36.2           oligoClasses_1.38.0      BiocGenerics_0.22.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12               compiler_3.4.1            
 [3] BiocInstaller_1.26.0       GenomeInfoDb_1.12.2       
 [5] bitops_1.0-6               iterators_1.0.8           
 [7] tools_3.4.1                zlibbioc_1.22.0           
 [9] digest_0.6.12              bit_1.1-12                
[11] memoise_1.1.0              tibble_1.3.4              
[13] preprocessCore_1.38.1      lattice_0.20-35           
[15] ff_2.2-13                  pkgconfig_2.0.1           
[17] rlang_0.1.2                Matrix_1.2-11             
[19] foreach_1.4.3              DelayedArray_0.2.7        
[21] GenomeInfoDbData_0.99.0    affxparser_1.48.0         
[23] bit64_0.9-7                grid_3.4.1                
[25] blob_1.1.0                 splines_3.4.1             
[27] codetools_0.2-15           matrixStats_0.52.2        
[29] GenomicRanges_1.28.4       SummarizedExperiment_1.6.3
[31] RCurl_1.95-4.8             affyio_1.46.0             




oligo parallel computation
Last seen 4 hours ago
Do you have a multicore Windows box? Mine has four:

> system.time(rma(affyExonFS))
Background correcting
Calculating Expression
   user  system elapsed
  15.84    0.27   16.30

> Sys.setenv(R_THREADS = 4)
> system.time(rma(affyExonFS))
Background correcting
Calculating Expression
   user  system elapsed
  12.54    0.12   16.11

Which is a pretty unimpressive speedup. But even with ff/foreach/doMC on my Linux box I get unimpressive results, particularly since it's like way nicer than my little desktop:

> system.time(rma(affyExonFS))
Background correcting
Calculating Expression
   user  system elapsed
 30.092   1.092  39.739
> registerDoMC(4)
> system.time(rma(affyExonFS))
Background correcting
Calculating Expression
   user  system elapsed
 29.644   0.940  30.616
> registerDoMC(25)
> system.time(rma(affyExonFS))
Background correcting
Calculating Expression
   user  system elapsed
 30.012   1.068  31.112


Thanks for your feedback. My take-home message: the above (still) is the proper way of using multiple cores with oligo, but the impact is indeed not impressive.


