oligo and parallel computing (on Windows)?
1
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 7 hours ago
Wageningen University, Wageningen, the …

I am about to get a large microarray data set which I would like to analyze using oligo. To run the analyses as efficiently as possible, I am exploring the parallelization options mentioned in the oligo user guide (section 7.3; Parallel Computing on Multicore Machines, pages 46-47). See also code below.

However, I don't see any improvement when setting the variable R_THREADS to a higher number (by doing so running times for this example were reduced with 50% in the user guide) , so I wonder whether that approach is (still) applicable to R on a Windows machine. Any advice would be appreciated.

NB: for me this feature is in the category "nice to have" rather than "need to have". :)

 

sample code (from user guide); I obtained same results (i.e. same elapsed times) when using my own (larger) data set.

> library(oligo)
> library(pd.huex.1.0.st.v2)
> library(oligoData)
> data(affyExonFS)
> t0 <- system.time(res0 <- rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
>
> Sys.setenv(R_THREADS=4)
> t1 <- system.time(res1 <- rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
>
> all.equal(res0, res1)
[1] TRUE
> t0
   user  system elapsed
  19.65    0.56   20.21
>
> t1
   user  system elapsed
  19.78    0.53   20.62
>
>

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] oligoData_1.8.0          pd.huex.1.0.st.v2_3.14.1 DBI_0.7                 
 [4] RSQLite_2.0              oligo_1.40.2             Biostrings_2.44.2       
 [7] XVector_0.16.0           IRanges_2.10.2           S4Vectors_0.14.3        
[10] Biobase_2.36.2           oligoClasses_1.38.0      BiocGenerics_0.22.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12               compiler_3.4.1            
 [3] BiocInstaller_1.26.0       GenomeInfoDb_1.12.2       
 [5] bitops_1.0-6               iterators_1.0.8           
 [7] tools_3.4.1                zlibbioc_1.22.0           
 [9] digest_0.6.12              bit_1.1-12                
[11] memoise_1.1.0              tibble_1.3.4              
[13] preprocessCore_1.38.1      lattice_0.20-35           
[15] ff_2.2-13                  pkgconfig_2.0.1           
[17] rlang_0.1.2                Matrix_1.2-11             
[19] foreach_1.4.3              DelayedArray_0.2.7        
[21] GenomeInfoDbData_0.99.0    affxparser_1.48.0         
[23] bit64_0.9-7                grid_3.4.1                
[25] blob_1.1.0                 splines_3.4.1             
[27] codetools_0.2-15           matrixStats_0.52.2        
[29] GenomicRanges_1.28.4       SummarizedExperiment_1.6.3
[31] RCurl_1.95-4.8             affyio_1.46.0             
>

 

 

 

oligo parallel computation • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

Do you have a multicore Windows box? Mine has four:

> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
  15.84    0.27   16.30

> Sys.setenv(R_THREADS = 4)
> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
  12.54    0.12   16.11

Which is a pretty unimpressive speedup. But even with ff/foreach/doMC on my Linux box I get unimpressive results, particularly since it's like way nicer than my little desktop:

> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
 30.092   1.092  39.739
> registerDoMC(4)
> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
 29.644   0.940  30.616
> registerDoMC(25)
> system.time(rma(affyExonFS))
Background correcting
Normalizing
Calculating Expression
   user  system elapsed
 30.012   1.068  31.112
>

 

ADD COMMENT
0
Entering edit mode

Thanks for your feedback. My take-home message: the above (still) is the proper way of using multiple cores with oligo, but the impact is indeed not impressive.

ADD REPLY

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6