I'm attempting to reproduce this WGCNA output with small cell lung cancer (SCLC) microarray gene expression data available from the CCLE. After importing gene expression data data, collapsing on genes, selecting the 8,000 most variable genes based on median absolute deviation (MAD), correcting for batch effects with ComBat and then gene median centering, I follow the WGCNA tutorial examples using my data. Unfortunately, I get a different output using the following parameters in the the blockwiseModules() function:
net = blockwiseModules(datExpr0, power = 6, networkType = "unsigned", minModuleSize = min(30, ncol(datExpr0)/2), reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE, saveTOMs = TRUE, saveTOMFileBase = "CCLETOM", verbose = 3)
It's close to the original result in that approximately the same number of gene modules are discovered, but obviously less interpretable.
Further complicating the situation is the fact that I don't know the normalization steps or parameters chosen to generate the original figure. As a result, I'm trying to understand why my results are so different trial-and-error style. My thought process thus far is the following:
Since my dendrogram does not nicely separate out by height, and varying the parameters for blockwiseModules() doesn't seem to change the dendrogram height much, I'm wondering if the original result used a TOM computed from an adjacency matrix based on a distance measure, rather than on correlation. To see if that's the difference, I've tried the following:
adjMat <- adjacency(datExpr0, power = 6, type = "distance", distOptions = "method = 'euclidean'") TOM <- TOMsimilarity(adjMat, TOMDenom = 'min', verbose = 1) consensusNetwork <- consensusDissTOMandTree(datExpr0, 6, TOM = TOM)
but get the following error for the consensusDissTOMandTree() function:
consensusNetwork <- consensusDissTOMandTree(datExpr0, 6, TOM = TOM) Error in multiExpr[]$data : $ operator is invalid for atomic vectors
I also can't seem to find an alternate function not intended to find consensus between two data sets in the WGCNA package. Then when trying to cluster the dissimilarity TOM with hclust(), I get the following error and can't figure out why.
diss <- 1 - TOM cluster_result <- hclust(diss, method = "average") Error in hclust(diss, method = "average") : 'N' must be a single integer.
And that's where I'm stuck -- any recommendations about how to perform WGCNA based on an adjacency matrix calculated on euclidean distance are much appreciated.
Moreover, any thoughts about why my output is different from the WGCNA results in the first place would also be a huge help.