I'm trying to create an unsigned coexpression network for some RNA-seq data (number of genes = 20555, number of subjects = 379). I have already processed the data through TMM/Voom. When I tried to create the coexpression network (using the same commands as in the WGCNA tutorials), specifically the pickSoftThreshold function, my data does not reach 0.90. It actually only reaches ~0.70. My understanding of the purpose of raising my data to a beta was to remove any noise embedded in the data. However, haven't I already done this by running TMM/Voom?
I suppose I have 2 main questions:
1) Could someone explain the difference of raising RNA-seq data to a soft-thresholding beta versus normalizing RNAseq data with TMM/voom?
2) Since the data is not reaching 90% to determine a soft threshold beta, what should I do to generate a coexpression network?
Amy
First, Voom may not be the best approach here since the point of voom is to create weights for each measurement, and WGCNA cannot use them (yet). I personally prefer the variance stabilizing transformation in DESeq2, but you could also simply transform the normalized counts using log2(x+1). See also WGCNA FAQ at https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html, item 4.
Second, I would plot a sample clustering tree and make sure there are no major branches that would indicate an overall expression driver (e.g., a batch effect). If you find one, it usually helps to adjust for it, e.g., using ComBat on the variance stabilized data. See the WGCNA FAQ, item 6 and possibly 5.
Peter,
I found a a bioconductor tutorial applying WGCNA with TCGA RNA-Seq data (https://www.bioconductor.org/packages/devel/bioc/vignettes/CVE/inst/doc/WGCNA_from_TCGA_RNAseq.html) were they use Voom to normalize data and get rid of "not-varyig" genes. Should i trust their method? If not, how do you suggest to remove not-varying genes?
Thanx