WGCNA parallelization multi thread for blockwiseModules and TOMsimilarity
1
0
Entering edit mode
bioming ▴ 10
@bioming-21835
Last seen 17 months ago
Queen's University

Hello,

I'm currently using WGCNA v1.68 to do network analysis on 50k probes. I have a few questions regarding parallelization in WGCNA, particularly running the blockwiseModules and TOMsimilarity functions. I came across another bioconductor question on this topic (https://support.bioconductor.org/p/86147/), but it was from 3 years ago, so I was wondering if there's any updates that I should be aware of?

In the previous questioned, Peter said that blockwiseModules was not parallelized, has this changed? He kindly suggested using an faster BLAS to speed up matrix multiplication in TOM calculations, currently my output when running TOMsimilarity() is showing "..matrix multiplication (system BLAS)..", so I'm guessing the system BLAS is not the fast BLAS Peter's referring to? Does anyone know which fast BLAS I should try installing? (I'm currently using R on a CentOS server with up to 50 cores).

Much thanks for any help anyone can provide,

Ming

wgcna parallelization • 587 views
1
Entering edit mode
@peter-langfelder-4469
Last seen 12 months ago
United States

I'll try to explain this as best as I can. When calculating TOM from expression data, WGCNA package does some parallelization but this is only performed in correlation calculations, and even those lead to noticeable speedup only when there are many missing values in the expression data which is rare these days. When there are no missing data, the step that is most time-consuming is the matrix multiplication of the adjacency with itself. This is performed in WGCNA by a call to a BLAS routine unless the argument useInternalMatrixAlgebra is TRUE (by default it is FALSE) in which case the matrix multiplication is performed by a slow WGCNA-own routine. I do not recommend this route unless you have a good reason to suspect that your BLAS libraries are buggy.

When WGCNA reports "using system BLAS", it means it uses whatever R was compiled against. You may be able to see that when you run sessionInfo(): mine reports a line

BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.7.so

signifying my R is compiled against OpenBLAS which is quite fast.

Depending on what system you are on and whether you have administrative privileges, getting R to work with a fast BLAS may be trivial or very complicated. I recommend starting with R installation manual at https://cran.r-project.org/doc/manuals/r-release/R-admin.html, specifically the BLAS section at https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS.

0
Entering edit mode

Hi Peter, thank-you so much for your fast response, I understand more now. When I do sessionInfo() it indeed shows the generic Rblas. I'm working with a computing cluster and getting R to switch to using openBLAS seems to be complicated as you have foreseen.

On another note regarding blockwiseModules, I just wanted to confirm the parallelization inside this function. I tried testing on some BRCA data (590 subjects x 8640 genes), and ran with:

1. nThreads = 1 and maxBlockSize = 5000 (2 blocks) (took 6min)
2. nThreads = 18 and maxBlockSize = 5000 (2 blocks) (also 6min)
3. nThreads = 1 and maxBlockSize = 500 (18 blocks) (also 6min)
4. nThreads = 18 and maxBlockSize = 500 (18 blocks) (also 3min47sec)

Am I correct in assuming the 18 blocks, when given enough threads, will execute in parallel? But the TOM calculations inside blockwiseModules is the one that has not been parallelized yet, and will benefit from openBLAS? So if I wanted to run a large dataset, running blockwiseModules together with openBLAS will be the best way to go?.

Ming