WGCNA coupled with differential expression analysis: question about baseline module detection
Entering edit mode
Colari19 ▴ 20
Last seen 9 months ago
United Kingdom


I have an RNASeq dataset containing 200 samples from asthma patients beginning therapy with one of two drugs (drug A and drug B), i.e:

Drug A: 50 samples pre-treatment (baseline) Drug A: 50 samples one week post-treatment

Drug B: 50 samples pre-treatment (baseline) Drug B: 50 samples one week post-treatment

I'm interested in using WGCNA to identify co-expressed gene modules in the pre-treatment baseline samples that correlate with various clinical traits. As part of a differential expression analysis I would then like to carry out gene set testing on these modules to see how they behave at week 1 compared to baseline, i.e. are the genes in the baseline modules upregulated or downregulated at week 1?

I have a question about strategy that I'd like some thoughts on:

Should I pool the drug A and drug B baseline samples and run an ordinary WGCNA analysis. This seems like it would increase power as more samples are going into the WGCNA analysis. Or does it make more sense to identify separate modules in the drug A and drug B groups? This approach, however, seems cleaner, as this involves identifying modules and testing them for differential expression in exactly the same patients.

Any opinions would be appreciated.

wgcna WGCNA • 618 views
Entering edit mode
Last seen 1 hour ago
V&A Waterfront, Cape Town, South Africa

Hey Colari19,

There is no right or wrong answer. Assuming that your datasets have been processed in the same way, they can very much be processed together by WGCNA; however, the biological interpretation of the results may be more difficult, in your particular case.

The way that WGCNA is designed, once you derive your modules, these can then be correlated or regressed back to your metadata to infer which modules are statistically significantly associated with this [metadata]. For example, if the cyan module is associated to FEV1 via linear regression, and disease status via binary logistic regression, then we would further explore the genes contained in the cyan module. In your study, you would have to regress your modules to parameters and combinations of parameters, which is not impossible, of course:

  • Drug
  • Timepoint
  • Drug:Timepoint

However, by just regressing against drug, the result may be confounded by timepoint, and vice-versa. You could, of course, control for either of these in the model, too.

Doing 2 separate WGCNA analyses —one for each drug— may work better, and then just checking for module associations to Timepoint. I think that your sample numbers are okay for this.


Entering edit mode

Hi Kevin. Thank you for your answer. I think I've settled on running separate analyses for each drug.

Entering edit mode

Hi Kevin,

Although it's late, I'll be happy to see know your answer.

Here, you mentioned that "You could, of course, control for either of these in the model, too". As WGCNA performs an unsupervised analysis and can get all genes, not just DEG. Could you please let me know if we can do the WGCNA analysis without doing differential expression analysis? I downloaded some cancer microarray datasets from public databases that have not control samples, so I cannot do DEG analysis. I'm thinking of doing a WGCNA analysis on these cancer datasets, what's your opinion about it? please kindly share with me your comments and suggestions.

Thanks in advance

Entering edit mode

Hey Sara, yes you could use this dataset, i.e., the one without controls. You could analyse each cancer separately and derive modules for these, or, process all cancers combined and hope to identify overlapping 'pathways' across different cancer types, which should exist (e.g. Wnt signalling, TGFbeta signalling, etc). Would be interesting. After you have identified modules, that's where the downstream stats / regression would occur using, in this case, survival info, tumour histology, tumour grade, age, etc.

Entering edit mode

Thank you Kevin.

My dataset is different subtypes of breast cancer that multiple studies were collected for each subtype, meta-analysis; there are some clinical traits for some of the datasets, not all of them. Please kindly let me know which procedure do you recommend for WGCNA analysis, merging all of them and doing the analysis on the merged data or doing analysis on each subtype, separately? which one is more informative, especially, for finding the probable subtype-specific pathways and hub genes?

Yes, finding overlapping pathways among all subtypes would be also interesting. Consensus analysis with WGCNA is appropriate for this question, yes?


Login before adding your answer.

Traffic: 689 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6