WGCNA: Topological Overlap Matrix (TOM) Values Greater Than 1
2
0
Entering edit mode
teresisc ▴ 10
@teresisc-24311
Last seen 11 months ago

Hello,

I am running WGCNA on an expression matrix. I have recently transitioned from running the scripts on my work computer to a computing cluster, and with that transition have moved from testing the code on smaller datasets to the full data set. I have version controlled the package versions on both systems.

I have successfully completed the tutorials here: tutorials for WGCNA, main website with smaller data (I included almost half of the rows of my expression matrix ~40K out of ~100K rows). However when I try to scale to the full dataset, ~100K rows I get a non-critical error (the code doesn't crash, but I get a message) when calculating the Topological Overlap Matrix (TOM) with the command: TOMsimilarity(adjacency), where adjacency is the adjacency matrix from the adjacency() function.

The code does not crash, but it prints to the console "problem: *arbitrary number here* TOM entries are larger than 1", the number of entries that are larger than 1 only increases as I add more rows to the input expression matrix. I am quite confused because I am of the understanding that the values in the TOM matrix should be bounded between 0 and 1. Additionally, the script does not crash when I get the error. I also have been unable to find any other occurrences of people having this problem on the web, so I am very much at a loss as what to do.

Does anyone have any suggestions or advice on what I ought to do? I don't feel like I should continue with the analysis given the nature of the TOM definition, and I haven't been able to find any clues in the documentation for the function...

Thank you for your help!

WGCNA • 407 views
0
Entering edit mode

Package Info: WGCNA 1.69

1
Entering edit mode
@peter-langfelder-4469
Last seen 12 months ago
United States

I would check the resulting TOM. Theoretically TOM is indeed bounded between 0 and 1 but numerical errors can sometimes cause it to be slightly larger than 1. Calculate max(TOM-1); if this is less than say 1e-10, don't worry about it and continue as usual. However, if max(TOM-1) is of order say 1e-3 or larger, it likely means a problem with the underlying code. If you use a fast BLAS implementation, these have been known to "occasionally" (in some versions and on some architectures) produce errors or erroneous results.

0
Entering edit mode

Thank you for your response Dr. Langfelder.

I got a value of 4.17 from max(TOM - 1). l see that in my scale free topology graph R^2 generally gets worse as Soft Threshold Power increases, where in the tutorial R^2 generally improves as you increase the power. Could this be a possible hint that I am doing something wrong in the code leading up to the construction of the adjacency matrix?

From my sessionInfo() on R (4.0.2) I am using BLAS/LAPACK /opt/software/OpenBLAS/0.3.7-GCC-8.3.0/lib/libopenblas_sandybridgep-r0.3.7.so

0
Entering edit mode

Something in the TOM calculation is definitely buggy. OpenBLAS has been a source of problems before although I'm not sure it is the problem here. Ideally you should try the same calculation with a different BLAS (ATLAS or MKL), although I assume you are doing this on a machine where you don't have admin privileges and you would probably have to be on very friendly terms with your admins to get them to install an alternate R with a different BLAS.

If you have time (unfortunately, you will need days), you can try to rerun the TOM calculation with the argument useInternalMatrixAlgebra = TRUE. This will skip the BLAS matrix multiplication and replace it with a simple C multiplication code that is neither fast nor parallel. It will probably take days to calculate TOM (for 1000 genes it takes about 1 second on my computer; for 100k genes it would take 11 days if I had enough RAM), but it would go a long way to figuring out whether OpanBLAS is the culprit here or whether it is something else in the code.

0
Entering edit mode

Thank you. I will try using the internal matrix algebra and report back on that. In the meantime I will try to discern what other BLAS libraries I can utilize on the computing cluster. I know there are different versions of R installed and I have the feeling one of them comes packaged with MKL...

1
Entering edit mode

I was able to run a slightly larger set on my personal computer and compare with the script on the computing cluster. I can confirm that OpenBLAS (cluster settings described above) produces a different result than my personal computer (libRBLAS with LAPACK as lipopenblasp r0.2.20). The OpenBlas version does produce values greater than 1, where the libRBLAS does not. Will try different versions of BLAS on the cluster.

0
Entering edit mode
teresisc ▴ 10
@teresisc-24311
Last seen 11 months ago

Thank you Dr. Langfedler for your help with this. Your suggestion to try a BLAS implementation other than OpenBLAS worked for me. For others with this problem who may be reading this, I found that the libr implementation as well as the MKL implementation of BLAS worked and gave me correct values. The MKL implementation was slightly faster. Finally, at least for me, my personal computer's installation of R was using libr by default, that is why once I transitioned to using the computing cluster's implementation of R (which by default used OpenBLAS) I noticed an issue. Even if you version control all of your packages when moving from your personal computer to a cluster, it is worth checking the BLAS algorithm as well.

Once again thank you so much! I would have never thought to check something like the BLAS algorithm. You saved me a lot of time and headache!