Question

WGCNA package : identification of hub genes

2

Entering edit mode

bharata1803 ▴ 60

@bharata1803-7698

Last seen 5.1 years ago

Japan

I have a research problem that I want to solve. Basically, I want to find important genes per module that is generated from WGCNA algorithm.

I define important genes as hub genes. Hub genes is defined as genes that have most connectivity. I read from some paper that basically the calculation is to sum all the weight of each node, sort it from the highest to smallest, and select top 1%,5%, or 10%.

After I have list of important genes, I need to find modules in the network generated from WGCNA, and map the hub genes to the modules. That way, I will have modules and important genes per module data.

To do that, I try the basic WGCNA tutorial because I am new to this package. I have followed WGCNA tutorial from : https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/.

In tutorial 2b, I have followed until calculation of Topological Matrix (TOM). Below is the code:

softPower = 6;

adjacency = adjacency(datExpr, power = softPower);

TOM = TOMsimilarity(adjacency);

dissTOM = 1-TOM

It seems this part is where the network is generated in the form of adjacency matrix.

My questions are:

1. Which matrix is used for calculation of hub genes? The adjaceny or dissTOM? I checked that dissTOM matrix contains number above 0.99. Is it right?

2. Is it better make some cutoff with a threshold to determine whether 2 genes are connected first before calculating the weight sum for determining hub genes? If a pair of gene has weight less than cutoff, I set it to 0, otherwise I set it to 1. That way, I just need to calculate how many 1 to determine the hub genes.

Thank you very much.

wgcna • 2.3k views

ADD COMMENT • link 6.3 years ago bharata1803 ▴ 60