Question

WGCNA - module membership via max kME?

1

Entering edit mode

ly.leifels ▴ 10

@lyleifels-13624

Last seen 6.8 years ago

Hello,

I am working with a scRNA-seq dataset and I want to analyse module memberships for low abundance genes via WGCNA generated gene co-expression networks. I found that the module-color assignments from BlockWiseModules() are different from the module it would be assigned to looking only at the maximum abs(kME) value from singnedkME(). I am computing the kME-Table for all modules based on the module eigengenes generated from BlockWiseModules(). The color-assignment is important to me, for visualisation of switching modules during downscaling. Looking at the maximum kME-value for a gene it gets assigned to, for example, the black-module, while the module assignment from BlockWiseModules$colors says it is assigned to the grey-module. There is a analyses step mentioned in the supplementary material of the WGCNA-paper, saying that after merging close modules genes with higher kME-values for another module than the one they are assigned to get switched to the higher correlated module. How can this difference still happen? How are genes assigned to modules in detail? Thank you for any hints!!

wgcna modules • 7.0k views

ADD COMMENT • link updated 6.7 years ago by Peter Langfelder ★ 3.0k • written 6.8 years ago by ly.leifels ▴ 10

score 2 · Answer 1 · 2017-10-27

First things first: grey is not really a module, it is a label for unassigned genes, and the eigengene and kME for the grey "module" are more or less meaningless. In other words, ignore the eigengene and kME values for the grey "module".

WGCNA assigns module labels using dynamic tree cut (look up dynmaicTreeCut) of hierarchical clustering tree that is based on the Toplogical Overlap Measure (TOM). TOM results in similar but not quite the same similarity as correlation, hence for some genes the assigned module may differ from the module with highest kME. Module merging can also play a part here.

Practically speaking, genes will have a high kME to their assigned module. When assigned module and module of highest kME differ, the gene probably has high kME to both and can be considered intermediate between the two modules.

I don't really recommend this, but if you absolutely want all genes to be assigned to the module of highest kME, try using argument reassignThreshold=1 to blockwiseModules. This will re-assign all genes to the module of their highest kME after the initial modules have been identified. Note though that the reassignment is not iterated with module eigengene re-calculation.

In all, I don't worry about the module assignment vs. max. kME differences in my own analyses, and I recommend not worrying it about it to others as well.

Peter

score 1 · Answer 2 · 2017-11-02

It is hard to say why you don't get more genes in modules (it's like asking "why does my experiment not work" without actually telling people what you did in your experiment), but perhaps you should look into how many genes are actually detected (have counts > 0) in each cell, and how many genes have counts greater than a few (say 3) across a sufficiently large number of cells that a correlation analysis makes sense.

score 0 · Answer 3 · 2017-10-27

The genes are assigned to a module using the TOM approach. There is some technical discussion about how does it work in the website of WGCNA.

When merging the modules, the gene correlation to the modules also changes! Thus it implies a new module assignment which could be different from what it is expected (not that I have looked how frequently this happens)

score 0 · Answer 4 · 2017-11-02

0

Entering edit mode

ly.leifels ▴ 10

@lyleifels-13624

Last seen 6.8 years ago

Thank you for your responses!
I will only use the original module assignments for my thesis and the associated kME-values for those modules.
Why is it that nearly 90% of the 25,600 genes in my dataset (1,800 cells from mouse cortex) could not be assigned to any module? The dataset is very deeply sequenced to a depth of at least 5,000,000 total reads (median ~8,700,000, range ~3,800,000 - 84,300,000). Changing the minimum module size from 20 to 10 did not change the fact that 90% of all genes could not be clustered to a module.
Can I argument, that this is only due to the fact that those genes have too low kME-values to get assigned to a module? I detected 12 gene modules of which 5 had a size of less than 40 genes.
Thank you in advance!
best wishes,
Lydia

ADD COMMENT • link 6.7 years ago ly.leifels ▴ 10

1

Entering edit mode

From your description, it seems you are having a similar issue as I had when dealing with single-cell RNAseq in WGCNA.

Does your gene dendrogram look something like this:

https://drive.google.com/open?id=1SdhPk8YAYuHrV6m0TtKH-3sDC_61gODm

If it does, and if that is what you mean by "Why is it that nearly 90% of the 25,600 genes in my dataset (1,800 cells from mouse cortex) could not be assigned to any module?", you are probably dealing with a heterogeneous sample. 1800 cells is probably too many samples to run WGCNA with. Even though high sample number is required, too many samples might also not be ideal due to sub-structuring of your data.

I recommend you look into the SampleNetwork algorithm by Michael Oldham (https://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/SampleNetwork/) which is designed to detect and remove outliers and undesired covariate effects from samples collected from separate papers/authors/sources. SampleNetwork should allow you to detect groups within your sample universe which you could analyze separately with WGCNA. Also, take a look at the sample dendrogram produced during the first steps of sample processing in WGCNA (covered in this tutorial https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-01-dataInput.pdf). This dendrogram should give you a general feel of outliers in your samples. SampleNetwork will, additionally, allow you to quantitatively assess the impact of these outliers in network connectivity and clustering (modularity) and to sequentially remove them until you get a sample group that is suited to co-expression network analysis.

Here is the sample dendrogram (produced in WGCNA, no SampleNetwork) of the samples used to produce the gene dendrogram I linked above. Note that there are many samples which cluster very distantly from others. This kind of structuring will negatively affect how WGCNA detects modules.

https://drive.google.com/open?id=1Dzu4SbQzCFBv7apaQ9OG895XON1QteEs

You are probably getting poor soft-threshold (power) X scale-free topology fit index graphs, such as this:

https://drive.google.com/open?id=1l5Ena1yNpgJu96l20_7Qpj__aV89qoLX

This is definitely due to highly heterogeneous samples.

Here is a dendrogram of a subsample of the samples depicted in the above-linked dendrogram:

https://drive.google.com/open?id=1cNRdJK79fO8Ke7tL4cfCgCFlUc5iwyK3

And a gene dendrogram (obtained from WGCNA) like this:

https://drive.google.com/open?id=1JGbM8oFIQdGzBov0ND36ViJTP06k1neO

Hope this helped!

Best,

Thomaz Luscher

ADD REPLY • link 6.3 years ago luxeredias ▴ 20

0

Entering edit mode

How many samples do you have? Which is your scale-free topology threshold? What coverage do you have? Are you filtering by low abundance genes or are you filtering them somehow?

ADD REPLY • link 6.7 years ago Lluís Revilla Sancho ▴ 730