I have a dataset in which I study the effects of a demethylating agent. I have 16 control untreated samples and 16 treated samples (with the demethylating agent) of moue haematopoetic stem cells. RNA-Seq data was obtained for all 32 samples. I used DESEq2 package to calculate DE. I intended to obtain the co-expression networks and the hub genes that modulate the expression difference due to treatment with the demethylating agent.
Performing unsupervised WGCNA on the entire gene list and merging modules with a threshold of 0.15, gave 5 modules. For choosing the most biologically interesting modules, I used Fisher's exact test to each module to check if significantly up/down-regulated genes are enriched in that given module. This gave me two interesting modules (with significant p values and high fold of more than estimated number of significantly DE genes).
Further, for each module I obtained the hub genes by listing the top ten genes with highest "kwithin". To evaluate the hub genes and to build pathways that make up each module, I used ReactomePA's cnetplot function to make a network connectivity plot (that shows general overall GO network connectivity of a module and also shows gene-gene connectivity among all networks within that module).
Note: I used the default unsigned network for building adjacency.
With regards to network building and evaluating hub genes, I have the following issues and questions:
1. Within a module, the hub genes do not show up in the reactomePA's cnetplot. Is this because these genes with high "kwithin" do not form a part of the enriched networks and are in some way non-confirming?
2. Is using an unsigned network here good? I did this because the demethylating agent was expected to increase expression level by activating the expression of tumour suppressor genes (the demethylating agent is a popular cancer drug Azacytidine). And I wanted to capture potential role of suppressors/enhancers and therefore wanted to account for both up/down-regulated genes by considering positive and negative expression correlations.
3. In general, what is a realistic "ktotal"? Is a ktotal of 145 (for the hubgene with kwithin as 1.0) belonging to a module with 360 genes too high?
4. If I get a p value=0 in the Fisher's exact test for evaluating the enrichment of the up/down regulated genes in a module but the enrichment is high, say ~10X than expected, should I consider this a biologically relevant module?
5. Later, as part of the study described above, some mice cell lines were also treated with Azacytidine, but this time the samples size is only 8(4 control and 4 treatment). I fully understand that WGCNA on 8 samples is not advisable, but since I used it in the previous study, I am tempted to use it for this as well. If I do, and I get a scale-free topology, what are some important ways I can evaluate if the modules formed are reliable? (If it does not make much sense at all to do so, I am planning on taking significantly DE genes from DESeq2 and using Cytoscape for network building. Any other suggestions would also be absolutely great!).
I find WGCNA very useful and love playing around with the options. Any suggestions how I can maximally tweak it to run my analyses would be great and thanks in advance!