Hi!
I hope you’re doing well during this strange and chaotic time.
I conducted WGCNA for my RNAseq data with 18000 genes, and got some nice modules that seemed to have very good eigengene values with the traits I’m interested in. Then I tried to look for hub genes in these modules and also try to visualize the networks. However, since most of my modules of interest are a little big, contained 200-1000 genes, so I have set a high threshold for the weights of the edges when I export the network to Cytoscape, with the idea that to present networks that human eyes can manage to a degree. Here’s an example of a network I have: I think it looks fine, and it revealed a few highly connected genes that are very interesting.
However, while thinking through my method, I’m not entirely sure whether I did it in a correct way. In other words, I’m not sure the highly connected genes in the network graph right now are indeed among the highly connected genes of the entire module. Because I have filtered out most of the edges in the network when exporting data for the graph, it dawned on me that there’s a strong possibility that I actually filtered out the real hub genes. For instance, in a module of 1000 genes, I have three hub genes that are connected with 900, 800, 700 genes, but it seems possible none of their edge met my threshold cutoff then I completely missed.
Do you think my concerns here are real, and I probably have missed a lot of important genes by doing so? If so, do you have any suggestions on how to visualize the networks in a way that human eyes can manage (i.e. not too many genes on the network), but still show the real hubs?
Thanks in advance!
Best, Minya
Hello Minya, please elaborate on the exact cut-off criteria that you used for eliminating edges and vertices (genes), i.e., which metric(s) did you use? Thank you. By the way, I think that it is okay to have a filtered graph for visualisation purposes, while the analysis methods continue on the unfiltered version.
Sorry I should have explained that part more!
I followed the "Export network to Cytoscape" part in the WGCNA tutorial:
For each module of interest, besides changing the
#Select modules
part, I only changed the threshold in the last line whenexportNetworkToCytoscape
.How I determine the threshold was that I will look at the distribution of weight values of the edges of that module, and arbitrarily choose the 90th, or 95th, or 99th percentile of the distribution has the threshold. For example, for the graph I posted above, it's from a large module with about 1000 genes, and I think I chose the 99th percentile of the edge value distribution. I have read other papers doing the same thing that's why I decided to it like this in the beginning. But now I'm just worried this is not the correct way to do it since I left out most of the edges.
Thank you!
I see... at 0.1, your threshold is actually lower than the default. So, it really depends on what you are then doing in Cytoscape. Is it just for added visualisation?; or are you performing further analyses in Cytoscape?
WGCNA is about weighted networks, i.e., where everything is connected to everything, but the connections (edges) are weighted to infer strength of connection. I am sure that Cytoscape can support such networks, too, whereas other network functions may not implicitly look at the edge weights.
All depends on what you are doing with the data in Cytoscape.
I'm sorry this is a mistake on my part: I don't remember why it's threshold = 0.1 here, it must be a change that shouldn't have been saved in the code - my apologies!
I just double-checked everything and everything else in my previous reply is correct (i.e. how I filtered the edges by percentile based on edge value distribution of the network), and all the modules I exported in this way for Cytopspace had edge weights between 0.80 to 0.97. For instance the network graph I posted initially, the edge values were all between 0.80 to 0.93 for that particular module, which did reveal a few highly connected genes, but I was just concerned I left out other highly connected genes by filtering.
Okay, no problem. It's possible that they have been left out, but I figure that Cytoscape would simply show them on the side, with no connections to anything. This is the default behaviour of igraph, but I am not sure how Cytoscape handles these (if present), nor am I sure how the WGCNA export function handles these (it may or may not include them in the exported object). You could possibly determine via a few lines of code which, if any, genes are lost as a result of the filtering.