Hi!
I would like some suggestions on filtering low variance genes for WGCNA.
I have done a round of WGCNA exercises on my own RNA-seq data. I filtered out genes with low counts (less than 10 counts in more than 90% of samples), pre-processed the data with the VST function from the DESeq2 package, as recommended from the WGCNA FAQ page, and this gave me a total of 18303 genes (originally 30023 genes) for the network analysis. I got 14 nice modules, with the gene numbers ranged from 60 to 5600 per module.
Now I'm considering to reduce the number of input genes so that hopefully I can get modules with fewer genes as my ultimate goal would be to pick some hub genes for downstream functional studies. I have read from some publications that they preprocessed their data by removing genes that showed less than 0.05 variance across all samples before they did network analysis. I think this is a good idea that maybe I can try to implement too, since low-expressed or non-varying genes usually represent noise as suggested by the WGCNA FAQ.
However, I'm not very sure at which stage I should do the filtering by variance. Should I 1) filter by variance and counts first, then do VST transformation for the resultant list, or 2) filter by counts, do VST transformation, then filter by variance?
Any suggestion is appreciated!
I don't have any particular suggestion on WGCNA input, but will mention that I'd prefer (2) over (1) because filtering by variance before transformation will just be filtering on the mean.