Search
Question: WGCNA: What is `soft thresholding`?
1
gravatar for jol.espinoz
14 months ago by
jol.espinoz10
jol.espinoz10 wrote:

I have been using the WGCNA package (https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html) for correlation networks. 
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559

I understand what a hard threshold is: absolute value of correlation matrix, choose a cutoff (e.g. 0.85), anything above is considered connected in the network. But then there is soft thresholding which is when you exponentiate the correlation matrix and that accentuates larger connections.

How do you then decide which ones are connected or not? Do you do a hard threshold after the soft thresholding?

Basically, going from `|correlation|` => `exponentiated(correlation)` => `adjacency_matrix`

`One potential drawback of soft thresholding is that it is not clear how to define the directly linked neighbors of a node. A soft adjacency matrix only allows one to rank all the nodes of the network according to how strong their connection strength is with respect to the node under consideration. If a list of neighbors is requested, one needs to threshold the connection strengths, i.e. the values in the adjacency matrix. When dealing with an unweighted network, this is equivalent to the standard approach of hard thresholding the co-expression similarities since the adjacency function is monotonically increasing by definition.`

https://labs.genetics.ucla.edu/horvath/GeneralFramework/WeightedNetwork2005.pdf

Can you change the hard threshold for the soft threshold or is that hard-coded in there? If so, what is that threshold?

ADD COMMENTlink modified 14 months ago • written 14 months ago by jol.espinoz10
2
gravatar for Lluís R
14 months ago by
Lluís R300
European Union
Lluís R300 wrote:

The soft thresholding, is a value used to power the correlation of the genes to that threshold. The assumption on that by raising the correlation to a power will reduce the noise of the correlations in the adjacency matrix. To pick up one threshold use the pickSoftThreshold function, which calculates for each power if the network resembles to a scale-free graph. The power which produce a higher similarity with a scale-free network is the one you should use.

I use the soft thresholding, and I haven't used the hard threshold, because I assume that the metabolic network should be scale-free. After the soft threshold I use the Topological Overlap Measure (TOM)  and dynamicTree to create the clusters.

ADD COMMENTlink written 14 months ago by Lluís R300

Thanks @Lluis!  I understand the concept of exponentiating the correlation matrix but I am still confused how `WGCNA` determines which genes/nodes are connected in the network.  For example, if one had a correlation of `0.99**12 = 0.886` what happens to that value? Is there a hard threshold underlying the soft threshold? 

ADD REPLYlink written 14 months ago by jol.espinoz10
1

Well usually the steps are: correlation->adjacency (cor**power)->Topological Overlap Measure (That takes into account the correlation between the other genes to asses how much two genes are correlated, see) -> clusters (with the DynamicTree algorithm). It is in this last step that the clusters are created, see. There is a hard threshold on the DynamicTree process, but it is the number of genes involved on each module.

ADD REPLYlink written 14 months ago by Lluís R300

Thanks again for the response Lluis.  Shouldn't TOM have the threshold since it needs to count the shared neighbors? 

ADD REPLYlink written 14 months ago by jol.espinoz10
1

TOM calculation counts neighbors using a weighted sum: the weaker the connection, the less it counts.

ADD REPLYlink written 14 months ago by Peter Langfelder1.3k

Hi Peter, Is that a `signed TOM`? My understand of TOM is that for `node_A` and `node_B` it takes the overlap of neighbors for `node_A` and `node_B` then normalizes.  By weighted, do you mean the weights of each neighbor are taken into account when saying if 2 nodes have an overlap of neighbors?  So in the end, everything is connected (either strongly or loosely)? Are there still cutoffs during this part either? 

ADD REPLYlink modified 14 months ago • written 14 months ago by jol.espinoz10
1

See slide 13 of this presentation, about the meaning of weighted network and their relationship with thresholds. By weight is understood that the relation between node_A and node_B can be between -1 and 1 (or in any continuous range), not 1 or 0 by yes or not connected, which would be an unweighted network, like DE analysis, or a network of people I know. So yes, everything is connected or at least we can calculate a correlation between everything. Cutoffs are not needed to create the TOM matrix.

ADD REPLYlink written 14 months ago by Lluís R300

I'm about ready to select this as the correct answer.  Thanks again for your time Lluis in helping me understand this. Just to clarify, during the TOM calculation of a weighted network, is that a signed TOM.  And lastly, is the `nearest neighbors` calculation in a network (getting actual nodes that are connected) irrelevant in a weighted network? 

ADD REPLYlink written 14 months ago by jol.espinoz10
1

TOM in a weighted network can be signed or unsigned. Whether it is signed or unsigned has nothing to do with whether the network is weighted or unweighted.

Nearest neighbors of a node (call the node A) can be generalized in a weighted network to those nodes that have the highest connection strength to node A. It could be relevant in some analyses (network neighborhood analysis).

ADD REPLYlink written 14 months ago by Peter Langfelder1.3k

Can't thank you and Lluis enough for helping me get the details of all of this.  Great tool Peter!

ADD REPLYlink written 14 months ago by jol.espinoz10
1

Signed TOM takes into account that some genes may have an unclear signal of correlation between some genes. It is explained in the technical paper you link but the idea is that a gene A may be positive correlated to gene B and C but between gene B and C there is a negative correlation. So how can it be A positive correlated to both B and C? => the signal/pattern is not clear. If you perform an unsigned TOM, it doesn't care about such noise and the resulting modules would be less correlated. You can calculate the TOM matrix with both signed or unsigned approach.

Well, it is almost never irrelevant, the whole point of this technique is to identify how the network of genes work, how do they work together, and with which genes work together. Thus defining which are the nearest neighbors is fundamental. In other ways, people is trying to achieve that by looking into which proteins interact, which transcription factors repress other genes... However, this neighbors (genes in the same module) are further explored with enrichment analysis, to asses if they are really meaningful, from a biological point of view.

 

It is also helping me to realize things, and to learn how to explain it, as well as checking I don't say things different than Peter, one of the authors of WGCNA.

ADD REPLYlink written 14 months ago by Lluís R300
`One potential drawback of soft thresholding is that it is not clear how to define the directly linked neighbors of a node. A soft adjacency matrix only allows one to rank all the nodes of the network according to how strong their connection strength is with respect to the node under consideration. If a list of neighbors is requested, one needs to threshold the connection strengths, i.e. the values in the adjacency matrix. When dealing with an unweighted network, this is equivalent to the standard approach of hard thresholding the co-expression similarities since the adjacency function is monotonically increasing by definition.`   

https://labs.genetics.ucla.edu/horvath/GeneralFramework/WeightedNetwork2005.pdf

 

Can you change the hard threshold for the soft threshold or is that hard-coded in there? If so, what is that threshold?

ADD REPLYlink modified 14 months ago • written 14 months ago by jol.espinoz10
2
gravatar for Peter Langfelder
14 months ago by
United States
Peter Langfelder1.3k wrote:

WGCNA, as the name implies, is a tool primarily intended for analyzing weighted networks. In a weighted network, you don't decide which nodes are connected and which are not - all nodes are in principle connected, but the strength varies (by convention) between 0 and 1. Soft thresholding really means suppressing low correlations in a continuous ("soft") manner rather than the discontinuous ("hard") thresholding used in constructing unweighted networks.

 

ADD COMMENTlink modified 14 months ago • written 14 months ago by Peter Langfelder1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 170 users visited in the last hour