Question

WGCNA - How to prevent "blockwiseModules" command from dividing genes into multiple blocks?

0

Entering edit mode

gokce.ouz ▴ 70

@gokceouz-11205

Last seen 7.6 years ago

Hi,

I am using RNA-Seq data for WGCNA. I have 34 samples. For my WGCNA analysis, I am using o networkType=Signed hybrid, TOM=Signed, corType=bicor, pearsonFallback = "individual" & deepSplit= 2 .

nethybrid.2 = blockwiseModules(datExpr, power = softpower,maxBlockSize = 46000,
                                     TOMType = "signed", minModuleSize = 30, deepSplit =2,
                                     reassignThreshold = 0, mergeCutHeight = 0.25,
                                     numericLabels = TRUE, pamRespectsDendro = FALSE,
                                     saveTOMs = TRUE,networkType = "signed hybrid",
                                     saveTOMFileBase = "34patient_signedhybrid_TOM_46000",
                                     verbose = 5,corType = "bicor", maxPOutliers = 0.1,
                                     pearsonFallback = "individual")

I know maximum WGCNA can analyze is 46000. That is why I decreased my genes to 45901. My aim is to analyse them all together in 1 block to get 1 TOM file for further network analysis. However, when I run the code below, my genes are divided into 2 block. Is there any possible way to prevent this division into multiple blocks ? Or is it possible if only I follow the steps in the step by step WGCNA tutorials. Because I also tried it, but it hangs everytime.

When I include this line of command saveTOMs = TRUE, saveTOMFileBase = "34patient_signedhybrid_TOM_46000" to save TOM, it runs very long and never finishes. Is there anyway to optimize it ? I read this post WGCNA blockwiseModules parallelisation question suggesting fast Blast library, but I though as my data is much more smaller than their data, there might be another solution also ?

Thanks in advance,

Gokce

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v79_0.99.12 ensembldb_1.4.7           
 [3] edgeR_3.14.0               limma_3.28.21             
 [5] amap_0.8-14                sva_3.20.0                
 [7] mgcv_1.8-15                nlme_3.1-128              
 [9] doParallel_1.0.10          iterators_1.0.8           
[11] foreach_1.4.3              reshape_0.8.5             
[13] cluster_2.0.4              matrixStats_0.50.2        
[15] flashClust_1.01-2          WGCNA_1.51                
[17] fastcluster_1.1.21         dynamicTreeCut_1.63-1     
[19] pheatmap_1.0.8             genefilter_1.54.2         
[21] gplots_3.0.1               RColorBrewer_1.1-2        
[23] vsn_3.40.0                 org.Hs.eg.db_3.3.0        
[25] DESeq2_1.12.4              BiocParallel_1.6.6        
[27] GenomicAlignments_1.8.4    SummarizedExperiment_1.2.3
[29] GenomicFeatures_1.24.5     AnnotationDbi_1.34.4      
[31] Biobase_2.32.0             Rsamtools_1.24.0          
[33] Biostrings_2.40.2          XVector_0.12.1            
[35] GenomicRanges_1.24.3       GenomeInfoDb_1.8.7        
[37] IRanges_2.6.1              S4Vectors_0.10.3          
[39] BiocGenerics_0.18.0        Hmisc_3.17-4              
[41] ggplot2_2.1.0              Formula_1.2-1             
[43] survival_2.39-5            lattice_0.20-34           

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    AnnotationHub_2.4.2          
 [3] splines_3.3.1                 gtools_3.5.0                 
 [5] shiny_0.14                    interactiveDisplayBase_1.10.3
 [7] affy_1.50.0                   latticeExtra_0.6-28          
 [9] impute_1.46.0                 RSQLite_1.0.0                
[11] digest_0.6.10                 chron_2.3-47                 
[13] colorspace_1.2-6              httpuv_1.3.3                 
[15] htmltools_0.3.5               preprocessCore_1.34.0        
[17] Matrix_1.2-7.1                plyr_1.8.4                   
[19] XML_3.98-1.4                  biomaRt_2.28.0               
[21] zlibbioc_1.18.0               xtable_1.8-2                 
[23] GO.db_3.3.0                   scales_0.4.0                 
[25] gdata_2.17.0                  affyio_1.42.0                
[27] annotate_1.50.0               nnet_7.3-12                  
[29] mime_0.5                      foreign_0.8-67               
[31] BiocInstaller_1.22.3          tools_3.3.1                  
[33] data.table_1.9.6              munsell_0.4.3                
[35] locfit_1.5-9.1                caTools_1.17.1               
[37] grid_3.3.1                    RCurl_1.95-4.8               
[39] bitops_1.0-6                  gtable_0.2.0                 
[41] codetools_0.2-14              DBI_0.5-1                    
[43] R6_2.1.3                      gridExtra_2.2.1              
[45] rtracklayer_1.32.2            KernSmooth_2.23-15           
[47] Rcpp_0.12.7                   geneplotter_1.50.0           
[49] rpart_4.1-10                  acepack_1.3-3.3

WGCNA RNA-Seq • 4.1k views

ADD COMMENT • link updated 7.6 years ago by Peter Langfelder ★ 3.0k • written 7.6 years ago by gokce.ouz ▴ 70

0

Entering edit mode

Hi, some questions:

What do you mean with 'it never finishes'? How long did you wait or how did you figure out the program was stalling?

Why do you say that WGCNA can analyse max 46K genes?

ADD REPLY • link 7.6 years ago Marge ▴ 10

0

Entering edit mode

Hi Marge,

In our server/ cluster, users have limited time (24 hours) to actively use R in interactive queue. That is why I wrote "it never finishes". In other words it is my connection problem nothing related to the program.

The max 46K genes explained below by Dr. Peter Langfelder, it is the number WGCNA can handle per block. For example, you can still analyse 460K genes but you need to do it with at least 10 blocks.

ADD REPLY • link 7.6 years ago gokce.ouz ▴ 70

0

Entering edit mode

Thanks a lot for the explanations (to you both).

Best,

Marge

ADD REPLY • link 7.6 years ago Marge ▴ 10

score 1 · Answer 1 · 2016-10-03

1

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 28 days ago

United States

You need to set the argument 'maxBlockSize' for blockwiseModules to something larger than your number of genes. But do note that to analyze a 45k genes in a single block, you need a workstation with at least 64GB available RAM; 96GB or more may be preferable. If you have less RAM, the computer will start swapping to disk (using the disk as "supplemental" RAM) which is orders of magnitude slower and feels like the calculation never finishes.

Also note that saving a TOM that is around 10GB in file size will take a long time (depending on how fast the file system is).

Indeed, because of the way the compiled code is called, WGCNA cannot at present analyze more than sqrt(2^31) ~ 46300 genes in one block; but it can analyze more in the block-by-block manner.

ADD COMMENT • link 7.6 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thank you for your answer Peter. While importing results from WGCNA to external network programs I found it difficult to using multiple TOM (still newbie at the field) . Even though different blocks genes has zero TOM WGCNA cytoscape export question, I wanted to analyse using single block to be safe.

By the way, normally I was using EnsDb.Hsapiens.v79 for annotation which was returning ~46K genes. When I used the org.Hs.eg.db, the gene number decreased to ~23K genes. Do you have any preference between these two libraries or should I look both of their results?

Thanks in advance,

Gokce

ADD REPLY • link 7.6 years ago gokce.ouz ▴ 70