Search
Question: oligo and parallel computing (on Windows)?
0
gravatar for Guido Hooiveld
12 weeks ago by
Guido Hooiveld2.1k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.1k wrote:

I am about to get a large microarray data set which I would like to analyze using oligo. To run the analyses as efficiently as possible, I am exploring the parallelization options mentioned in the oligo user guide (section 7.3; Parallel Computing on Multicore Machines, pages 46-47). See also code below.

However, I don't see any improvement when setting the variable R_THREADS to a higher number (by doing so running times for this example were reduced with 50% in the user guide) , so I wonder whether that approach is (still) applicable to R on a Windows machine. Any advice would be appreciated.

NB: for me this feature is in the category "nice to have" rather than "need to have". :)

 

sample code (from user guide); I obtained same results (i.e. same elapsed times) when using my own (larger) data set.

> library(oligo)
> library(pd.huex.1.0.st.v2)
> library(oligoData)
> data(affyExonFS)
> t0 <- system.time(res0 <- rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
>
> Sys.setenv(R_THREADS=4)
> t1 <- system.time(res1 <- rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
>
> all.equal(res0, res1)
[1] TRUE
> t0
   user  system elapsed
  19.65    0.56   20.21
>
> t1
   user  system elapsed
  19.78    0.53   20.62
>
>

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] oligoData_1.8.0          pd.huex.1.0.st.v2_3.14.1 DBI_0.7                 
 [4] RSQLite_2.0              oligo_1.40.2             Biostrings_2.44.2       
 [7] XVector_0.16.0           IRanges_2.10.2           S4Vectors_0.14.3        
[10] Biobase_2.36.2           oligoClasses_1.38.0      BiocGenerics_0.22.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12               compiler_3.4.1            
 [3] BiocInstaller_1.26.0       GenomeInfoDb_1.12.2       
 [5] bitops_1.0-6               iterators_1.0.8           
 [7] tools_3.4.1                zlibbioc_1.22.0           
 [9] digest_0.6.12              bit_1.1-12                
[11] memoise_1.1.0              tibble_1.3.4              
[13] preprocessCore_1.38.1      lattice_0.20-35           
[15] ff_2.2-13                  pkgconfig_2.0.1           
[17] rlang_0.1.2                Matrix_1.2-11             
[19] foreach_1.4.3              DelayedArray_0.2.7        
[21] GenomeInfoDbData_0.99.0    affxparser_1.48.0         
[23] bit64_0.9-7                grid_3.4.1                
[25] blob_1.1.0                 splines_3.4.1             
[27] codetools_0.2-15           matrixStats_0.52.2        
[29] GenomicRanges_1.28.4       SummarizedExperiment_1.6.3
[31] RCurl_1.95-4.8             affyio_1.46.0             
>

 

 

 

ADD COMMENTlink modified 12 weeks ago by James W. MacDonald45k • written 12 weeks ago by Guido Hooiveld2.1k
0
gravatar for James W. MacDonald
12 weeks ago by
United States
James W. MacDonald45k wrote:

Do you have a multicore Windows box? Mine has four:

> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
  15.84    0.27   16.30

> Sys.setenv(R_THREADS = 4)
> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
  12.54    0.12   16.11

Which is a pretty unimpressive speedup. But even with ff/foreach/doMC on my Linux box I get unimpressive results, particularly since it's like way nicer than my little desktop:

> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
 30.092   1.092  39.739
> registerDoMC(4)
> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
 29.644   0.940  30.616
> registerDoMC(25)
> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
 30.012   1.068  31.112
>

 

ADD COMMENTlink written 12 weeks ago by James W. MacDonald45k

Thanks for your feedback. My take-home message: the above (still) is the proper way of using multiple cores with oligo, but the impact is indeed not impressive.

ADD REPLYlink written 12 weeks ago by Guido Hooiveld2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 144 users visited in the last hour