Dear all,
I'm new to WGCNA and interested in the differences in expression between 10 tolerant and 10 sensitive plants, using RNA-seq data.
What I understood so far is that one can start from a full dataset (e.g. 20 samples), and look for their preservation in either tolerant of sensitive plants. So to me it seems that you're looking at the overall correlation of expression in all samples and than try to determine what modules are specific to either tolerant or sensitive plants. However, as you start from a combined dataset, I am wondering about the biological relevance of the correlation analysis of the full dataset. For example, what happen to the genes are highly expressed in the tolerant group, but not/low expressed in the sensitive group? Will they still end up in a module in the full dataset?
Intuitively, I'd like to asses the datasets separately (tolerant, 10 samples and sensitive 10 samples) and see what modules change in the sensitive group compared to the tolerant group. However, in this approach I'd use 10 replicates only, and I've read that the analysis minimum is 15.
Can anyone provide some advice/explain what type of analysis will be useful?
Much appreciated,
Nicky
I don't think WGCNA is the best type of analysis to do here because you are looking at just one trait with just two groups: i.e. tolerant and sensitive. You could just correlate modules with a dummy variable, but this seems a bit clumsy in style of analysis. Your right you lack power to do a comparison of modules between plant groups. I'd start with conventional limma analysis and then cluster, you could just cluster the probes/genes and cut out modules using R.