Question: trigger package fails at parallalization - Transcriptional Regulatory Inference from Genetics of ExpRession
0
gravatar for affennacken
5.1 years ago by
Netherlands
affennacken0 wrote:

Dear Bioconductor Community,

the reference manual (October, 21, 2014) of the bioconductor trigger package states that it is doing calculations in parallel at least on large datasets (p 11: trigger.mlink-methods; p 12: trigger.net-method), which makes sense because a large number of permutations may be involved. I cannot get parallel processing running, neither on the minimal example provided below, nor on larger datasets. As seen in the example above, I am using doMC in order to mediate parallelization. Should I install a different parallelization package other than doMC? Do I somehow interpret the reference manual the wrong way? Or is the trigger package buggy in that sense?

Help is greatly appreciated,
Kind regards,

Jonas

 

No parallel processing is achieved using the following code:

library(doMC)
library(trigger)
## registering multiple cores
registerDoMC(cores = 4)
## loading trigger accompanied data:
data(yeast)
attach(yeast)
## sample gene indexes to idx
set.seed(666)
idx <- c(unique(sort(sample(1:nrow(exp), size = 150, replace = F)),383,590,5003,4949))
my_trigger <- trigger.build(exp = exp[idx,], exp.pos = exp.pos[idx,], marker=marker, marker.pos = marker.pos)
my_loclink <- trigger.loclink(my_trigger, window.size = 30000)
my_mlink <- trigger.mlink(my_loclink, B = 100,seed = 666)

 

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] doMC_1.3.3      iterators_1.0.7 foreach_1.4.2   trigger_1.10.0
[5] qtl_1.33-7      corpcor_1.6.7  

loaded via a namespace (and not attached):
[1] codetools_0.2-9 qvalue_1.38.0   sva_3.10.0      tcltk_3.1.1    
[5] tools_3.1.1  

 

ADD COMMENTlink modified 5.1 years ago by Valerie Obenchain6.7k • written 5.1 years ago by affennacken0
Answer: trigger package fails at parallalization - Transcriptional Regulatory Inference
1
gravatar for Valerie Obenchain
5.1 years ago by
United States
Valerie Obenchain6.7k wrote:

Hi Jonas,

Functions in the trigger package are not themselves run in parallel. I believe the authors intended that 'idx' would be used as the chunking argument to a parallel function outside the package. You can do this with doMC, you just need a foreach object and evaluation with %dopar%.

library(doMC)
cores <- 4
registerDoMC(cores = cores)
...
...

The gene index should be a list. For this example I'll split into approximately equal groups across the number of workers.

nrows <- nrow(my_loclink@exp)
idx <- split(seq_len(nrows), ceiling(seq_len(nrows)/(nrows/cores)))

> length(idx)
[1] 4
> elementLengths(idx)
 1  2  3  4 
37 38 37 38 

Create a foreach object and R expression then evaluate them with %dopar%. 

res <- foreach(i = idx) %dopar% {
    trigger.mlink(my_loclink, B=100, i=i, seed=666) }
> res <- foreach(i = idx) %dopar% {
+     trigger.mlink(my_loclink, B=100, i=i, seed=666) }
Error in { : 
  task 1 failed - "Please select at least 100 genes to compute multi-locus linkage for them"

Looks like we need at least 100 genes in each list element for a user-supplied 'idx'. This data set is small, only 150 genes, so we'll fake it just to demonstrate the parallel example.

idx <- list(1:100, 1:100)
res <- foreach(i = idx) %dopar% {
    trigger.mlink(my_loclink, B=100, i=i, seed=666) }

4 cores were specified but the list is length 2 so we only see 2 workers working ...

> res <- foreach(i = idx) %dopar% {
+     trigger.mlink(my_loclink, B=100, i=i, seed=666) }
[1] Start to calculate multi-locus linkage statistics ...
[1] Start to calculate multi-locus linkage statistics ...
[1] 10% completed
[1] 10% completed
[1] 20% completed
[1] 20% completed
[1] 30% completed
[1] 30% completed
...

and the result -

> res
[[1]]
*** TRIGGER object *** 
Marker matrix with  3244 rows and  112 columns 
Expression matrix with  150 rows and  112 columns 

[[2]]
*** TRIGGER object *** 
Marker matrix with  3244 rows and  112 columns 
Expression matrix with  150 rows and  112 columns 


Another option for parallel work is the BiocParallel package.

library(BiocParallel)

Multicore, Snow and BatchJobs backends are supported. We'll use Multicore since you were using doMC.

Register a MulticoreParam with 4 workers.

register(MulticoreParam(workers = 4))

BiocParallel has a family of bp*apply functions that are based on lapply(), sapply(), mapply() etc. but are run in parallel. bplaply() is similar to lapply(); the first argument is a list and each element is passed to FUN.

Create the FUN to be run on each worker.

FUN <- function(i) 
    trigger.mlink(my_loclink, B=100, i=i, seed=666)

Execute bplapply():

res <- bplapply(idx, FUN=FUN)

and we get the same result -

> res
[[1]]
*** TRIGGER object *** 
Marker matrix with  3244 rows and  112 columns 
Expression matrix with  150 rows and  112 columns 

[[2]]
*** TRIGGER object *** 
Marker matrix with  3244 rows and  112 columns 
Expression matrix with  150 rows and  112 columns 


Valerie

ADD COMMENTlink written 5.1 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 307 users visited in the last hour