While looking into WGCNA analysis, I saw that the minimum recommended sample size is 15 samples because:
correlations on fewer than 15 samples will simply be too noisy for the network to be biologically meaningful
I'm wondering if anyone here would be able to further clarify why this is the case. In my attempts to understand this question, I've thought of two possibilities that partially overlap...
(1) Having <15 samples invalidates conclusions because results might be spuriously driven by one or a couple of replicates
(2) Having ≥15 samples is suggested because a smaller number might not have enough power to detect any biological trends, i.e. module eigengenes won't have any underlying biological significance
Are either, or both, of these thoughts correct?
Out of interest, I ran a WGCNA analysis on a data set of 12 samples. I first recovered module eigengenes, and then correlated these with three binary traits. The results are entirely reasonable, with the most highly correlated module for each trait telling an interesting biological story that reflects standard differential expression + GO enrichment analysis. I should note that the data are heterogeneous, with differences in expression between samples wholly reflecting the traits of interest (as per point 5 here: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html). I'm guessing that this might be why results appear to be sensible: the data are highly informative without noise swamping the biological signals that we are interested in. In this case, I'd be inclined to say that the rule of thumb of 15 samples might not matter?
Any comments would be appreciated :)