Question

WGCNA - Compare module preservation between class

6

Entering edit mode

abellonau ▴ 60

@abellonau-8526

Last seen 7.8 years ago

European Union

Let's say, I have two classes, control and disease. So far, I have built WGCNA using the full dataset (both classes together) and identified modules that are related to the trait of interest. However, comparing this to WGCNAs built using only data from each class individually, some modules are not well preserved (by qualitative inspection). In the module preservation paper [Langfelder et al. Is My Network Module Preserved and Reproducible?] and related tutorials, module preservation is usually assessed between different datasets, with modules identified using one dataset assessed in another. In my case, it would mean identifying modules in control group, and assessing their preservation in disease group. However, in this way, I have no means of assessing the relevance of the modules to the trait of interest (except to vaguely conclude that certain modules identified in control is poorly preserved in disease group).

Instead, it makes sense to me to assess how the structure/connectivity within modules differ between classes by using module labels identified from the full dataset, and then calculate the stats, e.g. kME using these labels within each class individually and see how they differ. However, I have not seen it done before. I just wonder if my strategy is valid? Any advice is welcomed! Many thanks.

WGCNA coexpression • 4.8k views

ADD COMMENT • link updated 9.0 years ago by Peter Langfelder ★ 3.0k • written 9.0 years ago by abellonau ▴ 60

score 7 · Answer 1 · 2015-08-02

I think you have basically three options. First is to define the modules in the full data set, and study their preservation in the control and disease samples. In the language of the module preservation article, the reference set would be the full data set, and you would have two test data sets, namely the controls and the disease subsets. The advantage here is that you work with the full modules and you have a clear idea how they relate to disease status. A possible disadvantage is that it doesn't compare controls to cases, so preservation statistics may not give you a clear answer to "what is different between cases and controls".

Second, you could attempt a preservation study of the full modules between the control and disease subsets, but that could be problematic because it may break the assumption that the reference modules are well-defined in the reference set.

The third possible strategy is to construct modules individually in each of the two data sets, study their preservation, identify preserved and non-preserved modules, then apply the module labels (for example, from the controls) to the full (combined) data set and see which modules relate to disease status (or how the control modules overlap with the full modules). This would have the advantage of a being a more sharply-defined preservation study, but the relationship of modules to disease may be more difficult to quantify and interpret.

I guess it's up to you to weigh the pros and cons of each approach. They all are relatively simple to execute (you could have all three results within an easy day of work) but the interpretation of the results may be much more difficult.