Problem running dexseq in parallel using BiocParallel
2
1
Entering edit mode
forestylake ▴ 20
@forestylake-12745
Last seen 7.1 years ago

Hi,

I am trying to analyze differential exon usage patterns with dexseq (R version 3.3.3, DexSeq version 1.20.2) and everything works great when I tested with first 5000 bins from my data, but now I am having problems running the full dataset in parallel. Specifically, I followed the manual and run the following command:

BPPARAM = MulticoreParam(workers=4)

dxd=estimateDispersions(dxd, BPPARAM=BPPARAM)

(I'm running it on my laptop, which is a Macbook pro with 4 cores running macOS Sierra, if that information is helpful...) But when I checked cpu usage, rsession and rStudio combined was using less than 1% of cpu. In comparison, if I run dxd=estimateDispersions(dxd), cpu usage is close to 100% (I assume that's 100% of 1 core). So it looks like when I try to run this function in parallel nothing happens.

Any idea on what is holding up the process? Also what is a good way to check on progress? Some older versions of dexseq seemed to have the "." every 100 genes thing that is not in the current version.

 

Thank you so much!!

 

Here is the session info:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.20.2              RColorBrewer_1.1-2         AnnotationDbi_1.36.2       DESeq2_1.14.1             
 [5] SummarizedExperiment_1.4.0 GenomicRanges_1.26.4       GenomeInfoDb_1.10.3        IRanges_2.8.2             
 [9] S4Vectors_0.12.2           Biobase_2.34.0             BiocGenerics_0.20.0        BiocParallel_1.8.1        

loaded via a namespace (and not attached):
 [1] genefilter_1.56.0   statmod_1.4.29      locfit_1.5-9.1      splines_3.3.3       lattice_0.20-35    
 [6] colorspace_1.3-2    htmltools_0.3.5     base64enc_0.1-3     survival_2.41-2     XML_3.98-1.6       
[11] foreign_0.8-67      DBI_0.6-1           plyr_1.8.4          stringr_1.2.0       zlibbioc_1.20.0    
[16] Biostrings_2.42.1   munsell_0.4.3       gtable_0.2.0        hwriter_1.3.2       htmlwidgets_0.8    
[21] memoise_1.0.0       latticeExtra_0.6-28 knitr_1.15.1        biomaRt_2.30.0      geneplotter_1.52.0 
[26] htmlTable_1.9       Rcpp_0.12.10        acepack_1.4.1       xtable_1.8-2        backports_1.0.5    
[31] scales_0.4.1        checkmate_1.8.2     Hmisc_4.0-2         annotate_1.52.1     XVector_0.14.1     
[36] Rsamtools_1.26.1    gridExtra_2.2.1     ggplot2_2.2.1       digest_0.6.12       stringi_1.1.3      
[41] grid_3.3.3          bitops_1.0-6        tools_3.3.3         magrittr_1.5        lazyeval_0.2.0     
[46] RCurl_1.95-4.8      tibble_1.3.0        RSQLite_1.1-2       Formula_1.2-1       cluster_2.0.6      
[51] Matrix_1.2-8        data.table_1.10.4   rpart_4.1-10        nnet_7.3-12

dexseq • 2.1k views
ADD COMMENT
4
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 4 weeks ago
Italy

Hi,

I've the same problems on my MacBook (since macos 10.12, eventually already earlier). My workaround for this is to use SnowParam instead of the MulticoreParam (along with setting options(bphost="localhost")). Sometimes MulticoreParam works but I didn't get to the point to understand what causes MulticoreParam to hang.
 

cheers, jo

ADD COMMENT
0
Entering edit mode

Thanks a lot this is really helpful information! Well macos10.12 gives me more trouble than I expected (face palm).

 

ADD REPLY
1
Entering edit mode
@swvanderlaan-12768
Last seen 5.6 years ago

@Johannes Rainer

Hi,

What is your code exactly? I'm trying to do this: 

SNOWPARAM <- SnowParam(snowWorkers(), type = "SOCK", progressbar = TRUE,
    RNGseed = 9012014, log = TRUE, logdir = QC_loc, resultdir = QC_loc,
    jobname = "QCAEMS450KCOMBO")

When I do this afterwards:

register(SNOWPARAM)

aems450k_RAW_TWIECE <- MethylAid::summarize(targetsTWIECE, batchSize = 100,
    BPPARAM = SNOWPARAM, rp.zero = TRUE, verbose = TRUE,
    file = paste0(QC_loc,"/",Today,"_aems450k_RAW_TWIECE"))

It results in the following error:

Start summarization ...
Summarize data in parallel...
  |                                                                                                          |   0%
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
  |=====================================================                                                     |  50%
Error in close.connection(con) : invalid connection

Thanks!

Sander

ADD COMMENT
0
Entering edit mode

Your formulation should in principle work, but there seems to be a bug in BiocParallel's log function when a log directory is provided. The bug is most easily produced with

> xx = bplapply(1:4, sqrt, BPPARAM=SnowParam(2, log=TRUE, logdir=tempdir()))
Error in close.connection(con) : invalid connection

Unfortunately, you'll need to avoid it until a bug fix is available (version 1.8.2 in the current release, probably available bye end-of-week), e.g.,

> xx = bplapply(1:4, sqrt, BPPARAM=SnowParam(2, log=TRUE))
############### LOG OUTPUT ###############
Task: 1
Node: 1
Timestamp: 2017-04-04 21:18:45
Success: TRUE
Task duration:
   user  system elapsed 
      0       0       0 
Memory used:
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  601197 32.2    1168576 62.5  1168576 62.5
Vcells 1063541  8.2    1866454 14.3  1447373 11.1
Log messages:

stderr and stdout:
character(0)
############### LOG OUTPUT ###############
Task: 2
Node: 2
Timestamp: 2017-04-04 21:18:45
Success: TRUE
Task duration:
   user  system elapsed 
  0.000   0.000   0.001 
Memory used:
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  600940 32.1    1168576 62.5  1168576 62.5
Vcells 1063424  8.2    1866454 14.3  1447373 11.1
Log messages:

stderr and stdout:
character(0)

 

ADD REPLY

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6