Question: nbinomLRT function all cores usage
0
gravatar for mcalgaro93
5 months ago by
mcalgaro930
mcalgaro930 wrote:

Hi, I have a question about the cores usage during DESeq2 differential expression pipeline. The issue happens when I launch nbinomLRT function. 
The object i give to the function is:
 

> ddsDisp
class: DESeqDataSet 
dim: 843 100 
metadata(1): version
assays(2): counts mu
rownames(843): OTU_2 OTU_3 ... OTU_970 OTU_971
rowData names(9): baseMean baseVar ... dispOutlier dispMAP
colnames(100): Sample_1_grp1 Sample_2_grp1 ... Sample_99_grp2 Sample_100_grp2
colData names(3): grp NF.poscounts sizeFactor

So a matrix with 50 samples from experimental condition grp1 and 50 samples from grp2 (total 100 samples) with 843 rows.
And i call the function:

nbinomLRT(ddsDisp, reduced = ~ 1, full = ~ grp)

As I have to launch a lot of simulations in a server, I need all calculations to stay in a single core. So at the beginning of the script I've used:

register(SerialParam())

But things are different: when the script comes to this function all 20 cores of the server are saturated and the waiting time for a response is more than 7 minutes (for a 843x100 matrix, isn't it strange?)

And i've already tried calling the wrapper DESeq instead of the separated functions:

ddsRes <- DESeq(object = dds, test = "LRT", reduced = ~1, full = ~ grp, parallel = FALSE)
# or even this
ddsRes <- DESeq(object = dds, test = "LRT", reduced = ~1, full = ~ grp, parallel = TRUE, BPPARAM = MulticoreParam(1))

My thought is that, during QR decomposition inside nbinomLRT, the sample size (100) of the dataset is somehow to big and all cores are involved; because with lower sample sizes (10,20,50) the problem doesn't occure. That's why I tried to change the option useQR to FALSE without solving the problem of all cores usage but lowering waiting time. Is there something I can in order to avoid all cores usage?

Here my sessionInfo() (I know there is a newer version of R but in the server I have to use this one :( ):

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRblas.so
LAPACK: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] crayon_1.3.4               bindrcpp_0.2.2             Seurat_2.3.0               cowplot_0.9.3             
 [5] ggplot2_3.0.0              scde_1.99.1                flexmix_2.3-13             lattice_0.20-35           
 [9] MAST_1.4.1                 genefilter_1.60.0          AUC_0.3.0                  BiocParallel_1.12.0       
[13] zinbwave_1.0.0             SingleCellExperiment_1.0.0 samr_2.0                   impute_1.52.0             
[17] ROCR_1.0-7                 gplots_3.0.1               reshape2_1.4.3             plyr_1.8.4                
[21] phyloseq_1.22.3            metagenomeSeq_1.20.1       RColorBrewer_1.1-2         glmnet_2.0-16             
[25] foreach_1.4.4              Matrix_1.2-14              DESeq2_1.20.0              SummarizedExperiment_1.8.1
[29] DelayedArray_0.4.1         matrixStats_0.54.0         Biobase_2.38.0             GenomicRanges_1.30.3      
[33] GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       
[37] edgeR_3.20.9               limma_3.34.9               RevoUtils_10.0.9           RevoUtilsMath_10.0.1      

loaded via a namespace (and not attached):
  [1] SparseM_1.77              prabclus_2.2-6            ModelMetrics_1.2.0        R.methodsS3_1.7.1        
  [5] tidyr_0.8.1               acepack_1.4.1             bit64_0.9-7               knitr_1.20               
  [9] irlba_2.3.2               R.utils_2.7.0             Rook_1.1-1                data.table_1.11.8        
 [13] rpart_4.1-13              RCurl_1.95-4.11           metap_1.0                 snow_0.4-3               
 [17] RSQLite_2.1.1             RANN_2.6                  VGAM_1.0-6                proxy_0.4-22             
 [21] bit_1.1-14                lubridate_1.7.4           assertthat_0.2.0          gower_0.1.2              
 [25] RMTstat_0.3               hms_0.4.2                 DEoptimR_1.0-8            caTools_1.17.1.1         
 [29] readxl_1.1.0              igraph_1.2.2              DBI_1.0.0                 geneplotter_1.56.0       
 [33] htmlwidgets_1.3           ddalpha_1.3.4             RcppArmadillo_0.9.100.5.0 purrr_0.2.5              
 [37] dplyr_0.7.6               backports_1.1.2           permute_0.9-4             trimcluster_0.1-2.1      
 [41] annotate_1.56.2           gbRd_0.4-11               quantreg_5.36             Cairo_1.5-9              
 [45] abind_1.4-5               caret_6.0-80              withr_2.1.2               sfsmisc_1.1-2            
 [49] robustbase_0.93-3         checkmate_1.8.5           vegan_2.5-2               mclust_5.4.1             
 [53] softImpute_1.4            cluster_2.0.7-1           gsl_1.9-10.3              segmented_0.5-3.0        
 [57] ape_5.2                   ADGofTest_0.3             diffusionMap_1.1-0.1      lazyeval_0.2.1           
 [61] recipes_0.1.3             pkgconfig_2.0.2           nlme_3.1-131.1            nnet_7.3-12              
 [65] bindr_0.1.1               rlang_0.2.2               diptest_0.75-7            pls_2.7-0                
 [69] MatrixModels_0.4-1        extRemes_2.0-9            doSNOW_1.0.16             cellranger_1.1.0         
 [73] lmtest_0.9-36             distillery_1.0-4          carData_3.0-2             zoo_1.8-4                
 [77] base64enc_0.1-3           ggridges_0.5.1            png_0.1-7                 rjson_0.2.20             
 [81] stabledist_0.7-1          bitops_1.0-6              R.oo_1.22.0               Lmoments_1.2-3           
 [85] KernSmooth_2.23-15        Biostrings_2.46.0         blob_1.1.1                DRR_0.0.3                
 [89] lars_1.2                  stringr_1.3.1             brew_1.0-6                scales_1.0.0             
 [93] ica_1.0-2                 memoise_1.1.0             magrittr_1.5              bibtex_0.4.2             
 [97] gdata_2.18.0              zlibbioc_1.24.0           compiler_3.4.4            lsei_1.2-0               
[101] pcaMethods_1.70.0         dimRed_0.1.0              fitdistrplus_1.0-11       ade4_1.7-13              
[105] dtw_1.20-1                XVector_0.18.0            pbapply_1.3-4             htmlTable_1.12           
[109] magic_1.5-9               Formula_1.2-3             MASS_7.3-49               mgcv_1.8-23              
[113] tidyselect_0.2.5          stringi_1.2.4             forcats_0.3.0             copula_0.999-18          
[117] yaml_2.2.0                locfit_1.5-9.1            latticeExtra_0.6-28       grid_3.4.4               
[121] tools_3.4.4               rio_0.5.10                rstudioapi_0.8            foreign_0.8-69           
[125] gridExtra_2.3             prodlim_2018.04.18        scatterplot3d_0.3-41      Rtsne_0.13               
[129] digest_0.6.18             FNN_1.1.2.1               lava_1.6.3                fpc_2.1-11.1             
[133] Rcpp_0.12.19              car_3.0-2                 broom_0.5.0               SDMTools_1.1-221         
[137] AnnotationDbi_1.40.0      npsurv_0.4-0              kernlab_0.9-27            Rdpack_0.10-1            
[141] colorspace_1.3-2          ranger_0.10.1             XML_3.98-1.16             CVST_0.2-2               
[145] splines_3.4.4             RcppRoll_0.3.0            multtest_2.34.0           xtable_1.8-3             
[149] jsonlite_1.5              geometry_0.3-6            timeDate_3043.102         modeltools_0.2-22        
[153] ipred_0.9-7               tclust_1.4-1              R6_2.2.2                  Hmisc_4.1-1              
[157] pillar_1.3.0              htmltools_0.3.6           glue_1.3.0                pspline_1.0-18           
[161] class_7.3-14              codetools_0.2-15          tsne_0.1-3                pcaPP_1.9-73             
[165] mvtnorm_1.0-8             tibble_1.4.2              mixtools_1.1.0            numDeriv_2016.8-1        
[169] curl_3.2                  gtools_3.8.1              zip_1.0.0                 openxlsx_4.1.0           
[173] survival_2.41-3           biomformat_1.6.0          munsell_0.5.0             rhdf5_2.22.0             
[177] GenomeInfoDbData_1.0.0    iterators_1.0.10          haven_1.1.2               gtable_0.2.0         

I thank you in advance for your help,
Matteo

ADD COMMENTlink modified 5 months ago by davide risso810 • written 5 months ago by mcalgaro930
Answer: nbinomLRT function all cores usage
3
gravatar for davide risso
5 months ago by
davide risso810
Weill Cornell Medicine
davide risso810 wrote:

Ciao Matteo,

as you can see from your sessionInfo(), you are using a non-default implementation of the BLAS algebra library (specifically the Microsoft/Revolution implementation). This implementation uses parallel computation and, by default, uses all available cores.

Obviously, DESeq2 is doing some matrix algebra and your system is hence using all the available cores. BiocParallel has nothing to do with it.

At this link, you can find the instructions on how to change the default number of cores: https://mran.microsoft.com/documents/rro/multithread

Luckily, it can all be done within R so just adding the following lines to your script should solve your problem.

library(RevoUtilsMath)
setMKLthreads(1)
ADD COMMENTlink written 5 months ago by davide risso810

Thank you very much Davide, you solved my problem. :)

ADD REPLYlink written 5 months ago by mcalgaro930
Answer: nbinomLRT function all cores usage
0
gravatar for Michael Love
5 months ago by
Michael Love22k
United States
Michael Love22k wrote:

Hi Matteo,

I’m a bit confused because there are no parallel calls aside from DESeq(..., parallel=TRUE).

The sub functions are not making use of multiple workers or making calls to BiocParallel.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Michael Love22k

I'm confused too, because that's the reason why I asked the question. I add a link to a gif, just to show you :)

https://drive.google.com/file/d/1VCTiow9GUg-Fiu2xIQ-5UDCJYNMfH5Ek/view?usp=sharing

Now i'll try to do a check in another PC for checking if the issue is in the server.

ADD REPLYlink written 5 months ago by mcalgaro930

Ok. I guess I can’t offer much more advice than to say that unless you use parallel=TRUE when running DESeq(), we are making no hidden use of BiocParallel.

 

ADD REPLYlink written 5 months ago by Michael Love22k

You're using an old version of the software and have a ton of other packages attached, so basic debugging steps, though painful, will be to update to the current version of R / DESeq2 and to perform the analysis in a new session with only the essential packages.

ADD REPLYlink written 5 months ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 114 users visited in the last hour