Question

WGCNA: relate modules to binary trait

2

Entering edit mode

Lin ▴ 50

@lin-19103

Last seen 4.4 years ago

Hey!

I am going through the very helpful WGCNA tutorials and constructed a co-expression network, but now I am stuck with further steps and interpretation of my modules.

At first some information regarding my data and progress:

I have 60 samples from humans (assigned to 2 groups a 30 people)
Microarray data, pre-processed (batch effects, normalization, log2 transformation)
as input I used ~ 5000 genes, which I previously filtered variance-based
I decided to use a signed & weighted network (followed step-wise tutorial)
I ended up with 16 modules

Then, I tried to correlate the eigengene values of the modules with the external trait information. I was insecure if I can use correlation for binary information (my only "trait" information is if the sample is from human with/ without disease). However, I´ve read all posts I was able to find about this and there it was discussed that this is possible and basically a t-test between the groups.

So my code was this: moduleTraitCor = cor(MEs, Anno_180_T, use = "p"); Where I have my eigengenes in MEs and my sample class info in Anno_180_T.

So do you think this approach is ok so far?

My problem is: No single correlation between module eigengenes and the binary trait is significant.

I don´t know what I could do know, even functional enrichment does not really make sense when there is no relationship at all...

My question of interest was if I could find differentially co-expressed modules between disease/healthy persons. And I have additional data from 2 other time points which I thought to maybe analyze with consensus analysis/ eigengene networks...

Thank you for any help/ suggestions!!

PS: I also had a look at CEMiTool, which is somewhat based on WGCNA I think... There they seem to do gene set enrichment analysis for group differences. As far as I understood it mathematically, the difference is that there the expression of the whole module goes in, not only the eigengene. I tried this for my sample and every module was significant... I am really confused.

WGCNA module eigengene binary trait R • 4.8k views

ADD COMMENT • link updated 6.0 years ago by pedrostrusso ▴ 30 • written 6.1 years ago by Lin ▴ 50

score 2 · Answer 1 · 2019-01-02

2

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 3 months ago

United States

It is not uncommon to find no relationship between disease and co-expression modules. The first thing I would do is run a individual gene DE analysis (using say limma, but you could also use standardScreeningBinaryTrait in WGCNA) to see how much signal there is on the individual gene level. I would also perhaps not filter by variance on log-transformed data since, after log-transformation, the low-expressed probes tend to have more variance than the higher-expressed ones. I'd filter by mean expression and try to keep at least 10k genes in the analysis.

If you have additional time points, you could run the same analysis in the additional time points, especially the individual gene association analysis to see whether the individual gene associations among the three sets are at least somewhat concordant. Then you can try a consensus network analysis.

ADD COMMENT • link 6.1 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thank you for your answer!

The DE analysis was already done. There were some differences in for example immune-system related pathways, what makes sense regarding this disease. The most differences were observed for the 3. timepoint, what was the reason why I started to use this here.
I wanted to use co-expression network analysis because I read that it can often uncover relationships that are not found by DE analysis and is better regarding some points (for example the multiple testing problem).
I am not really familiar with different filtering methods, but I used that implemented in CEMiTool, as it was especially designed for co-expression network analysis. Then I used the probes I got there as the input for WGCNA.

Could you give me any further advice what would be good to do now then?

It is also not really clear for me, where lies the difference between Gene Set Enrichment between groups in CEMiTool and the analysis in WGCNA. Because I did not compare the modules completely, but at least I ended up with 16 modules in both analyses. However, the group comparison is significant for EVERY module in CEMiTool but for NO module in WGCNA. I understand that there are differences in the computation, but I really wonder what this then tells me biologically, how can I interpret this, how should I proceed...

ADD REPLY • link 6.1 years ago Lin ▴ 50

score 1 · Answer 2 · 2019-01-07

1

Entering edit mode

pedrostrusso ▴ 30

@pedrostrusso-13086

Last seen 4.2 years ago

Hi Lin, thanks for using CEMiTool!

CEMiTool's automatic filter is, as you may have noticed, quite stringent, and it tends to leave in the analysis mostly genes with significant activity in their given pathways. This is probably why you found that all your modules ended up being differentially expressed between your sample classes. You could try making the filter less stringent by playing around with the filter_pval parameter in the cemitool function, set by default to 0.1.

Also, since you applied a log-transformation to your data, it is possible that you introduced a dependency between the mean and the variance, which affects correlation-based analyses. You can check this by observing a straight line when using the plot_mean_var function on your CEMiTool object, which if present, you can try to correct by setting the apply_vst parameter in the cemitool function to TRUE.

ADD COMMENT • link 6.0 years ago pedrostrusso ▴ 30

1

Entering edit mode

Thanks for your answer Pedro!

Do you think I should use a more stringent parameter, because I thought that ~ 5000 genes would be a good sample? I will attach a plot from the mean& variance here. I think the data shows no straight line, but what could I read further from this/ what would you do know?

Additionally, I was not sure if spearman would be the better option, as the diagnostics when I used pearson were not optimal I think. This is the qqplot. However, when I used spearman I had the strange case, that in one module the NES was ~ 0 in both groups, but nevertheless the p-value was significant! What does that mean?? Didn´t I get it right that in CEMiTool a significant p-value in GSEA means that the ranked mean expressions from genes in this module is different between the groups? While in WGCNA sth like a PCA is applied first and the first eigengene is compared between groups?

I would be grateful for any further answer or help!

ADD REPLY • link 6.0 years ago Lin ▴ 50

1

Entering edit mode

Hi Lin, sorry, I didn't notice the 5000 genes you used as input. Since you previously filtered your dataset, did you remember to include the filter=FALSE parameter in your cemitool function call? Otherwise, the function will filter your already filtered dataset, which would probably cause some bias. Alternatively, you could insert your entire dataset (usually ~10-20K genes) into the cemitool function and let it do the filtering for you instead (which I would personally recommend).

As for your GSEA question, CEMiTool uses the GSEA algorithm implemented in the fgsea package, so I'd encourage you to ask more specific questions to the authors, but what I can say is that the algorithm does not seem to be very stringent in detecting group differences, so unless it looks like there are very strong differences between the classes, you should probably take the p-value significance with a grain of salt. This is because GSEA is recommended as a hypothesis-generating algorithm, not necessarily as a be-all and end-all answer.

Finally, at least one paper suggests that there might not be a significant difference in coexpression analysis results if you use Pearson or Spearman correlations, depending on a large enough number of samples (which seems to be your case).

ADD REPLY • link 6.0 years ago pedrostrusso ▴ 30

1

Entering edit mode

I used the CEMiTool function for filtering! I was just thinking that the end result with ~ 5000 genes is good, so I did not change the filter value.

What do you think about the plot of mean and variance, it does not look like I need the apply_vst parameter? And would you rather use pearson or spearman with this data (based on the qqplot)?

Thanks anyway for all your answers!

And regarding the group comparsion: It is more that I really wonder where the difference lays between WGCNA and CEMiTool (because in the group comparison they lead to very different results). It seems to me that the network construction is quite similar, with some new features offered by CEMiTool (e.g. filtering and beta parameter selection). However, I need a group comparison with reliable/ reasonable results and there I am not sure why they differ in their approaches and what would be the best line to follow.

ADD REPLY • link 6.0 years ago Lin ▴ 50

1

Entering edit mode

You're right, it doesn't look like you need the apply_vst parameter. About the qqplot, it looks like you could go either way... maybe Spearman could be a little more on the safe side, but I don't think Pearson would be a bad choice either.

About the group comparisons, as you said before, both packages use very different methods, so different outcomes are to be expected. It would be interesting to see how WGCNA's eigengene approach relates to the GSEA approach though, maybe you could enlighten us (:

ADD REPLY • link 6.0 years ago pedrostrusso ▴ 30