Hello,
I am working with a scRNA-seq dataset and I want to analyse module memberships for low abundance genes via WGCNA generated gene co-expression networks. I found that the module-color assignments from BlockWiseModules() are different from the module it would be assigned to looking only at the maximum abs(kME) value from singnedkME(). I am computing the kME-Table for all modules based on the module eigengenes generated from BlockWiseModules(). The color-assignment is important to me, for visualisation of switching modules during downscaling. Looking at the maximum kME-value for a gene it gets assigned to, for example, the black-module, while the module assignment from BlockWiseModules$colors says it is assigned to the grey-module. There is a analyses step mentioned in the supplementary material of the WGCNA-paper, saying that after merging close modules genes with higher kME-values for another module than the one they are assigned to get switched to the higher correlated module. How can this difference still happen? How are genes assigned to modules in detail? Thank you for any hints!!
From your description, it seems you are having a similar issue as I had when dealing with single-cell RNAseq in WGCNA.
Does your gene dendrogram look something like this:
https://drive.google.com/open?id=1SdhPk8YAYuHrV6m0TtKH-3sDC_61gODm
If it does, and if that is what you mean by "Why is it that nearly 90% of the 25,600 genes in my dataset (1,800 cells from mouse cortex) could not be assigned to any module?", you are probably dealing with a heterogeneous sample. 1800 cells is probably too many samples to run WGCNA with. Even though high sample number is required, too many samples might also not be ideal due to sub-structuring of your data.
I recommend you look into the SampleNetwork algorithm by Michael Oldham (https://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/SampleNetwork/) which is designed to detect and remove outliers and undesired covariate effects from samples collected from separate papers/authors/sources. SampleNetwork should allow you to detect groups within your sample universe which you could analyze separately with WGCNA. Also, take a look at the sample dendrogram produced during the first steps of sample processing in WGCNA (covered in this tutorial https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-01-dataInput.pdf). This dendrogram should give you a general feel of outliers in your samples. SampleNetwork will, additionally, allow you to quantitatively assess the impact of these outliers in network connectivity and clustering (modularity) and to sequentially remove them until you get a sample group that is suited to co-expression network analysis.
Here is the sample dendrogram (produced in WGCNA, no SampleNetwork) of the samples used to produce the gene dendrogram I linked above. Note that there are many samples which cluster very distantly from others. This kind of structuring will negatively affect how WGCNA detects modules.
https://drive.google.com/open?id=1Dzu4SbQzCFBv7apaQ9OG895XON1QteEs
You are probably getting poor soft-threshold (power) X scale-free topology fit index graphs, such as this:
https://drive.google.com/open?id=1l5Ena1yNpgJu96l20_7Qpj__aV89qoLX
This is definitely due to highly heterogeneous samples.
Here is a dendrogram of a subsample of the samples depicted in the above-linked dendrogram:
https://drive.google.com/open?id=1cNRdJK79fO8Ke7tL4cfCgCFlUc5iwyK3
And a gene dendrogram (obtained from WGCNA) like this:
https://drive.google.com/open?id=1JGbM8oFIQdGzBov0ND36ViJTP06k1neO
Hope this helped!
Best,
Thomaz Luscher
How many samples do you have? Which is your scale-free topology threshold? What coverage do you have? Are you filtering by low abundance genes or are you filtering them somehow?