Question

WGCNA distance measures, clustering with TOM based on adjacency matrix type = "distance" versus correlation

0

Entering edit mode

krr • 0

@krr-9753

Last seen 8.0 years ago

Hi all,

I'm attempting to reproduce this WGCNA output with small cell lung cancer (SCLC) microarray gene expression data available from the CCLE. After importing gene expression data data, collapsing on genes, selecting the 8,000 most variable genes based on median absolute deviation (MAD), correcting for batch effects with ComBat and then gene median centering, I follow the WGCNA tutorial examples using my data. Unfortunately, I get a different output using the following parameters in the the blockwiseModules() function:

net = blockwiseModules(datExpr0, power = 6,
                       networkType = "unsigned", minModuleSize = min(30, ncol(datExpr0)/2),
                       reassignThreshold = 0, mergeCutHeight = 0.25,
                       numericLabels = TRUE, pamRespectsDendro = FALSE,
                       saveTOMs = TRUE,
                       saveTOMFileBase = "CCLETOM",
                       verbose = 3)

It's close to the original result in that approximately the same number of gene modules are discovered, but obviously less interpretable.

Further complicating the situation is the fact that I don't know the normalization steps or parameters chosen to generate the original figure. As a result, I'm trying to understand why my results are so different trial-and-error style. My thought process thus far is the following:

Since my dendrogram does not nicely separate out by height, and varying the parameters for blockwiseModules() doesn't seem to change the dendrogram height much, I'm wondering if the original result used a TOM computed from an adjacency matrix based on a distance measure, rather than on correlation. To see if that's the difference, I've tried the following:

adjMat <- adjacency(datExpr0, power = 6, type = "distance", distOptions = "method = 'euclidean'")
TOM <- TOMsimilarity(adjMat, TOMDenom = 'min', verbose = 1)
consensusNetwork <- consensusDissTOMandTree(datExpr0, 6, TOM = TOM)

but get the following error for the consensusDissTOMandTree() function:

consensusNetwork <- consensusDissTOMandTree(datExpr0, 6, TOM = TOM)
Error in multiExpr[[1]]$data : $ operator is invalid for atomic vectors

I also can't seem to find an alternate function not intended to find consensus between two data sets in the WGCNA package. Then when trying to cluster the dissimilarity TOM with hclust(), I get the following error and can't figure out why.

diss <- 1 - TOM
cluster_result <- hclust(diss, method = "average")
Error in hclust(diss, method = "average") : 'N' must be a single integer.

And that's where I'm stuck -- any recommendations about how to perform WGCNA based on an adjacency matrix calculated on euclidean distance are much appreciated.

Moreover, any thoughts about why my output is different from the WGCNA results in the first place would also be a huge help.

Kind regards,

KRR

wgcna microarray clustering • 4.4k views

ADD COMMENT • link updated 8.2 years ago by Peter Langfelder ★ 3.0k • written 8.2 years ago by krr • 0

score 1 · Accepted Answer · 2016-02-18

Without seeing the code and argument choices that led to the original figure, it will be pretty much impossible to reproduce it. You should contact the authors for the code used to generate the analysis and the plot. From a visual inspection, it seems the authors (Udyavar et al) used a low soft-thresholding power, they may not have used TOM, and they seem to have used a constant height tree cut rather than the Dynamic Tree Cut procedure.

As to the errors you see, the function consensusDissTOMandTree needs, as input, multiple TOM matrices (typically from separate data sets), so it is not applicable to your single data set analysis.

For clustering, you need to use as.dist(diss) or as.dist(1-TOM) since hclust takes a distance structure, whereas TOM (and 1-TOM) is a matrix.

HTH,

Peter