I have RNA-seq data of 5 control and 15 case samples. I want to find modules related to the disease I'm studying. One of the steps in WGCNA is creating the correlation matrix in order to find co-expressed genes later on. From Peter Langfelder's previous posts on this form, I understand that all samples (case + control) should be used together. But it is not clear to me how this works, because if for example gene A correlates with gene B in control but not in case, then the correlation value will be based on mixed signals. What does this value really mean then? Is the assumption here that all genes will stay correlated in both groups, only the expression value will differ?
What sounds logical for me (an undergrad, so please bear with me) is to create modules for case and modules for control, and then search for modules which are not preserved. These will then be the modules related to the disease. Unfortunately with my low sample size I don't think this is a possibility for me. (Maybe it is?)
Thanks in advance.
Edit I have 5 wild-type samples and 15 disease samples. I'm interested in the processes related to the disease. What happens in the body transcription wise in disease samples? I think WGCNA is a good fit because the modules contain co-expressed and thus (hopefully) genes related to a shared function. Modules with high correlation to trait data (and perhaps with many DEGs?) should be annotated to see what functions the gene products perform.