Question

WGCNA - module membership preservation for low count genes

0

Entering edit mode

ly.leifels ▴ 10

@lyleifels-13624

Last seen 6.8 years ago

I have a sample dataset derived from single cell RNA Sequencing with 1800 samples. Some genes have only few counts. Using WGCNA I can compute modules and even define the module membership for each gene for each module. I want to find the number of counts for which a gene would be safely clustered into one (or 2) module(s).

Would it be valid to: Define genes with counts e.g. lower 5, by computing the max(counts) for each gene in the original dataset and select gene names. Create subsets with 80% the original dataset counts and compute the module membership in each subset. Compare module labels between subsets for groups of genes (counts lower 5, counts >5 and <10, and so on..) and select the group for which the module label doesn't change? What would be a more statistically valid way to compute module membership preservation for genes?

wgcna • 1.7k views

ADD COMMENT • link 7.0 years ago ly.leifels ▴ 10

0

Entering edit mode

ly.leifels ▴ 10

@lyleifels-13624

Last seen 6.8 years ago

Thank you for your answer! Of course I am trying to figure out a solution for myself but I am not sure whether some of my ideas make biologically or statistically sense. Thank you for your advice! :)

ADD COMMENT • link 7.0 years ago ly.leifels ▴ 10

score 2 · Accepted Answer · 2017-07-30

This is a research project (which you will hopefully solve) and we won't be able to give you a solution. I suspect that the ability to reliably cluster a gene would depend on two things: its counts (WGCNA doesn't work all that well with low counts) and the number of 0 values the gene has. In single cell seq a lot of the zeros mean the gene was not captured, so it is really more of a missing value than a zero abundance. Your idea of using resampling is good, but I'm not sure about pre-selecting genes. You could resample, get modules, figure out a good statistic or statistics for whether genes remain in the same module (this will be tricky), and relate that statistic or statistics to max counts, mean counts, numbers of zeros or whatever else you can think of.