My name is David Brohawn and I am a 4th year graduate student at Virginia Commonwealth University.
My advisor and I research ALS (Lou Gherigs disease). I have recently generated high density (50 Million 2X150 read pairs) RNA-Sequencing data for 15 human postmortem spinal cord tissue homogenates (7 ALS samples and 8 matched healthy controls). I have processed it with the Tuxedo Suite pipeline and have FPKM measurements for all known annotated genes in the hg19 build for all samples.
I find WGCNA highly intriguing, and would love to perform a consensus network analysis with the data we have prior to assessing differences between the two groups. The R code appears very manageable and does not seem unwieldy.
However, I am unclear if we have enough samples to run this analysis.
I see the first question on the WGCNA FAQ page says:
We do not recommend attempting WGCNA on a data set consisting of fewer than 15 samples. In a typical high-throughput setting, correlations on fewer than 15 samples will simply be too noisy for the network to be biologically meaningful.
Does this mean 15 samples in each group (cases and controls, so 30 total) or 15 total?
Sifting through many publications and online forums (Seqanswers, Biostars) has not shed much light on this numbers question, and I find very few published reports using RNA-Sequencing data and WGCNA with smaller N's.
2) Presuming 15 total samples is the minimum recommended and we can run a consensus analysis, do you recommend use of the spearman correlation or the biweight midcorrelation as we are going to be susceptible to outliers? My understanding is Spearman is more robust to outliers, though it may wipe out true signals.
Please advise, and thank you,