Question

WGCNA: single channel microarray, one categorical covariate, how to apply it?

2

Entering edit mode

jeiroje ▴ 20

@jeiroje-7014

Last seen 9.7 years ago

Italy

Dear all,

I just started to use WGCNA and can not really adapt the tutorials to my case.

I have some single channel microarray data coming from 40 samples: 6 of them are control samples, the other ones are from 10 different treatments (from 3 to 6 samples for each treatment). The treatment is the only difference among the samples - the only "clinical trait" that can be related to the analysis.

My data are log2 transformed, quantile normalized intensity signals from a single channel source, therefore positive and possibly quite large; conversely, the data from the WGCNA samples are from a 2-channels source and are given as the ratio of the mean log10 intensity, i.e. small magnitude and negative signs (GEO accession: GSE2814).

My research question is to create a coexpression network to be used for pathway detection, therefore it would be nice to have one network for each treatment. But since I will perform some other analysis before pathway detection, it is not really a problem if I get one single network using all the samples together. Moreover, according the WGCNA FAQ page (http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html) it is not advisable to perform WGCNA with less than 20 samples (that is a much larger number than the samples I have for each treatment).

These are my most pestering question, and possible workarounds:

1) how to pass from a two-channel microarray to a single channel microarray? At first I thought of using logFCs outputted by analysis with limma package, but the already mentioned WGCNA FAQ page warns that differential expression analysis should not be performed before WGCNA.

2) can I treat my samples as if they were coming from a population where the only difference among subjects is the treatment? The treatment would then be used as the only covariate, in fact as a single factor with 10+1 levels (such as it is normally done in regression models).

I am very hesitant about this last point because I fear that only genes that exhibit the very same behavior for all treatments will be clustered together, or merged together in a static way, while I would prefer to highlight clusters according to the treatment (and cannot understand if this is possible with WGCNA).

All in all, do you think WGCNA is the appropriate tool for my kind of question?

Thanks in advance.

wgcna twochannel onechannel • 3.0k views

ADD COMMENT • link updated 9.7 years ago by Peter Langfelder ★ 3.0k • written 9.7 years ago by jeiroje ▴ 20

score 1 · Answer 1 · 2014-11-10

To start with your last question, your data seem a good candidate for WGCNA, but I can't promise you'll find the results you're looking for.

Quantile normalized and log-transformed single-channel data are a good starting point for WGCNA, in fact most WGCNA analyses I am aware of are done on 1-channel data or RNA-seq.

I recommend binarizing the treatment, that is creating 10 or 11 binary variables, one for each level. The binary variable for level L equals 1 when treatment is L and zero otherwise.

Follow tutorial I; start in section 1 by clustering the samples and plotting the treatment (binary indicators) below the tree. If you see major splits (branches) in the tree, your data may be heterogeneous. If the splits correspond to treatments, the heterogeneity comes from the treatment, but you may also have batch effects or other technical effects.

You can run WGCNA on your entire data or filter the probes/genes by mean expression or variance. Construct the network and identify modules; then relate the modules to the 10 or 11 binary variables (or run a standard regression of module eigengenes on the treatment factor) to identify modules significantly related to the various treatments.

It may be that one or a group of treatments is really different from others and this swamps the expression signal. You would see it in the clustering tree of samples as well if this happens; you may want to create a second version of the data in which the big effect is adjusted for and run a second analysis on this to identify more subtle changes that were hidden by the large signal. See WGCNA FAQ item 4 for some discussion of this.

Hope this helps,

Peter