Hello,
I have an RNASeq dataset containing 200 samples from asthma patients beginning therapy with one of two drugs (drug A and drug B), i.e:
Drug A: 50 samples pre-treatment (baseline) Drug A: 50 samples one week post-treatment
Drug B: 50 samples pre-treatment (baseline) Drug B: 50 samples one week post-treatment
I'm interested in using WGCNA to identify co-expressed gene modules in the pre-treatment baseline samples that correlate with various clinical traits. As part of a differential expression analysis I would then like to carry out gene set testing on these modules to see how they behave at week 1 compared to baseline, i.e. are the genes in the baseline modules upregulated or downregulated at week 1?
I have a question about strategy that I'd like some thoughts on:
Should I pool the drug A and drug B baseline samples and run an ordinary WGCNA analysis. This seems like it would increase power as more samples are going into the WGCNA analysis. Or does it make more sense to identify separate modules in the drug A and drug B groups? This approach, however, seems cleaner, as this involves identifying modules and testing them for differential expression in exactly the same patients.
Any opinions would be appreciated.

Hi Kevin. Thank you for your answer. I think I've settled on running separate analyses for each drug.
Hi Kevin,
Although it's late, I'll be happy to see know your answer.
Here, you mentioned that "You could, of course, control for either of these in the model, too". As WGCNA performs an unsupervised analysis and can get all genes, not just DEG. Could you please let me know if we can do the WGCNA analysis without doing differential expression analysis? I downloaded some cancer microarray datasets from public databases that have not control samples, so I cannot do DEG analysis. I'm thinking of doing a WGCNA analysis on these cancer datasets, what's your opinion about it? please kindly share with me your comments and suggestions.
Thanks in advance
Hey Sara, yes you could use this dataset, i.e., the one without controls. You could analyse each cancer separately and derive modules for these, or, process all cancers combined and hope to identify overlapping 'pathways' across different cancer types, which should exist (e.g. Wnt signalling, TGFbeta signalling, etc). Would be interesting. After you have identified modules, that's where the downstream stats / regression would occur using, in this case, survival info, tumour histology, tumour grade, age, etc.
Thank you Kevin.
My dataset is different subtypes of breast cancer that multiple studies were collected for each subtype, meta-analysis; there are some clinical traits for some of the datasets, not all of them. Please kindly let me know which procedure do you recommend for WGCNA analysis, merging all of them and doing the analysis on the merged data or doing analysis on each subtype, separately? which one is more informative, especially, for finding the probable subtype-specific pathways and hub genes?
Yes, finding overlapping pathways among all subtypes would be also interesting. Consensus analysis with WGCNA is appropriate for this question, yes?
@slope If feasible, consider conducting both analyses. Start with a pooled analysis to identify broad patterns, then conduct separate analyses for drug A and drug B to explore any specific differences. This way, you can leverage the strengths of both approaches and validate your findings.