Question

WGCNA clustering configuration

0

Entering edit mode

Lluís Revilla Sancho ▴ 730

@lluis-revilla-sancho

Last seen 1 day ago

European Union

Hi,

I know WCGNA is not in Bioconductor, but it seems the best place to ask, and I have a couple of doubts about how to use it.

If I understood correctly WCGNA creates an unsupervised clustering but one can create those clusters manually. With blockwiseModules the blocks option allows to set genes to a cluster, but seems to force to put all genes to their cluster, not allowing a semi-supervised clustering. Could we impose/force to have a cluster with certain genes and "freely" cluster others?

Can other information than the expression, like GO, pathways, be used to build the clusters? AFAIK one can use a parameter, like time, treatment or alike to build the clusters, but it would be hard to create one for each GO, or pathway manually, at the gene level.

Many work related with WCGNA is done with co-expression, those genes with high expression are clustered togheter, however the correlation could be the oposite, whenever gene X is highly expressed gene Y is lowly expressed and whenever Y is highly expressed X is not much expressed.

I couldn't find any information about these questions, but I might miss them while reading the available information and tutorials. Many thanks

clustering wgcna microarray • 1.8k views

ADD COMMENT • link updated 8.0 years ago by Peter Langfelder ★ 3.0k • written 8.0 years ago by Lluís Revilla Sancho ▴ 730

score 1 · Answer 1 · 2016-05-13

You seem to confuse the pre-clustering necessary to deal with large data sets with the actual clustering into modules. Although both are unsupervised and based on the data, preclustering uses a simpler method that does not require as much memory as the hierarchical clustering used in the clustering step. This allows the pre-clustering to split very large data sets into smaller blocks that can be analyzed one by one. Every gene is forced into one of the blocks because all genes should be present in the analysis; the actual clustering is then used to assign genes to modules or to no module at all.

You can in principle use any similarity measure with WGCNA, e.g. one derived from external literature or databases such as GO, and I am sure there are approaches out there that combine data and external information.

Regarding your last question, your wording is a bit confusing, but you seem to ask whether WGCNA can create a network that is based on the absolute value of the correlation, that is, a network in which negatively correlated genes are strongly connected. Yes it can, in fact it is the default although we now recommend signed networks in which only positively correlated genes are (strongly) connected.

Peter

score 0 · Answer 2 · 2016-05-17

It's not clear to me why you chose the blocks as you did. In particular, the blocks are assumed to be labelled by positive integers (my fault that the positivity is not explicitly specified in the help, but it is implied). The block with label 0 is ignored, and you get one module in the block with 100 genes. If you really need to split the data into blocks, please use blocks in which all genes are assigned to a block with positive, consecutive integer labels. I recommend you use the function projectiveKMeans to find the blocks for you.

I strongly suggest that you read some of the WGCNA articles and work through the WGCNA tutorials to understand the input, aims and output of WGCNA.

Peter