Search
Question: Problem running dexseq in parallel using BiocParallel
1
gravatar for forestylake
7 months ago by
forestylake20
forestylake20 wrote:

Hi,

I am trying to analyze differential exon usage patterns with dexseq (R version 3.3.3, DexSeq version 1.20.2) and everything works great when I tested with first 5000 bins from my data, but now I am having problems running the full dataset in parallel. Specifically, I followed the manual and run the following command:

BPPARAM = MulticoreParam(workers=4)

dxd=estimateDispersions(dxd, BPPARAM=BPPARAM)

(I'm running it on my laptop, which is a Macbook pro with 4 cores running macOS Sierra, if that information is helpful...) But when I checked cpu usage, rsession and rStudio combined was using less than 1% of cpu. In comparison, if I run dxd=estimateDispersions(dxd), cpu usage is close to 100% (I assume that's 100% of 1 core). So it looks like when I try to run this function in parallel nothing happens.

Any idea on what is holding up the process? Also what is a good way to check on progress? Some older versions of dexseq seemed to have the "." every 100 genes thing that is not in the current version.

 

Thank you so much!!

 

Here is the session info:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.20.2              RColorBrewer_1.1-2         AnnotationDbi_1.36.2       DESeq2_1.14.1             
 [5] SummarizedExperiment_1.4.0 GenomicRanges_1.26.4       GenomeInfoDb_1.10.3        IRanges_2.8.2             
 [9] S4Vectors_0.12.2           Biobase_2.34.0             BiocGenerics_0.20.0        BiocParallel_1.8.1        

loaded via a namespace (and not attached):
 [1] genefilter_1.56.0   statmod_1.4.29      locfit_1.5-9.1      splines_3.3.3       lattice_0.20-35    
 [6] colorspace_1.3-2    htmltools_0.3.5     base64enc_0.1-3     survival_2.41-2     XML_3.98-1.6       
[11] foreign_0.8-67      DBI_0.6-1           plyr_1.8.4          stringr_1.2.0       zlibbioc_1.20.0    
[16] Biostrings_2.42.1   munsell_0.4.3       gtable_0.2.0        hwriter_1.3.2       htmlwidgets_0.8    
[21] memoise_1.0.0       latticeExtra_0.6-28 knitr_1.15.1        biomaRt_2.30.0      geneplotter_1.52.0 
[26] htmlTable_1.9       Rcpp_0.12.10        acepack_1.4.1       xtable_1.8-2        backports_1.0.5    
[31] scales_0.4.1        checkmate_1.8.2     Hmisc_4.0-2         annotate_1.52.1     XVector_0.14.1     
[36] Rsamtools_1.26.1    gridExtra_2.2.1     ggplot2_2.2.1       digest_0.6.12       stringi_1.1.3      
[41] grid_3.3.3          bitops_1.0-6        tools_3.3.3         magrittr_1.5        lazyeval_0.2.0     
[46] RCurl_1.95-4.8      tibble_1.3.0        RSQLite_1.1-2       Formula_1.2-1       cluster_2.0.6      
[51] Matrix_1.2-8        data.table_1.10.4   rpart_4.1-10        nnet_7.3-12

ADD COMMENTlink modified 7 months ago by s.w.vanderlaan20 • written 7 months ago by forestylake20
4
gravatar for Johannes Rainer
7 months ago by
Johannes Rainer1.0k
Italy
Johannes Rainer1.0k wrote:

Hi,

I've the same problems on my MacBook (since macos 10.12, eventually already earlier). My workaround for this is to use SnowParam instead of the MulticoreParam (along with setting options(bphost="localhost")). Sometimes MulticoreParam works but I didn't get to the point to understand what causes MulticoreParam to hang.
 

cheers, jo

ADD COMMENTlink written 7 months ago by Johannes Rainer1.0k

Thanks a lot this is really helpful information! Well macos10.12 gives me more trouble than I expected (face palm).

 

ADD REPLYlink written 7 months ago by forestylake20
1
gravatar for s.w.vanderlaan
7 months ago by
s.w.vanderlaan20 wrote:

@Johannes Rainer

Hi,

What is your code exactly? I'm trying to do this: 

SNOWPARAM <- SnowParam(snowWorkers(), type = "SOCK", progressbar = TRUE,
    RNGseed = 9012014, log = TRUE, logdir = QC_loc, resultdir = QC_loc,
    jobname = "QCAEMS450KCOMBO")

When I do this afterwards:

register(SNOWPARAM)

aems450k_RAW_TWIECE <- MethylAid::summarize(targetsTWIECE, batchSize = 100,
    BPPARAM = SNOWPARAM, rp.zero = TRUE, verbose = TRUE,
    file = paste0(QC_loc,"/",Today,"_aems450k_RAW_TWIECE"))

It results in the following error:

Start summarization ...
Summarize data in parallel...
  |                                                                                                          |   0%
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
  |=====================================================                                                     |  50%
Error in close.connection(con) : invalid connection

Thanks!

Sander

ADD COMMENTlink modified 7 months ago by Martin Morgan ♦♦ 20k • written 7 months ago by s.w.vanderlaan20

Your formulation should in principle work, but there seems to be a bug in BiocParallel's log function when a log directory is provided. The bug is most easily produced with

> xx = bplapply(1:4, sqrt, BPPARAM=SnowParam(2, log=TRUE, logdir=tempdir()))
Error in close.connection(con) : invalid connection

Unfortunately, you'll need to avoid it until a bug fix is available (version 1.8.2 in the current release, probably available bye end-of-week), e.g.,

> xx = bplapply(1:4, sqrt, BPPARAM=SnowParam(2, log=TRUE))
############### LOG OUTPUT ###############
Task: 1
Node: 1
Timestamp: 2017-04-04 21:18:45
Success: TRUE
Task duration:
   user  system elapsed 
      0       0       0 
Memory used:
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  601197 32.2    1168576 62.5  1168576 62.5
Vcells 1063541  8.2    1866454 14.3  1447373 11.1
Log messages:

stderr and stdout:
character(0)
############### LOG OUTPUT ###############
Task: 2
Node: 2
Timestamp: 2017-04-04 21:18:45
Success: TRUE
Task duration:
   user  system elapsed 
  0.000   0.000   0.001 
Memory used:
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  600940 32.1    1168576 62.5  1168576 62.5
Vcells 1063424  8.2    1866454 14.3  1447373 11.1
Log messages:

stderr and stdout:
character(0)

 

ADD REPLYlink modified 7 months ago • written 7 months ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 274 users visited in the last hour