Question

WGCNA small number of genes

1

Entering edit mode

zson3366 ▴ 10

@zson3366-12250

Last seen 7.2 years ago

I am a user of WGCNA found it really useful for my RNA seq analysis! Currently, because of my research interest, the data I am planning to work on is qPCR data, with much smaller number of genes (from around 30 genes), with relatively big sample size (1000 samples, highly heterogeneous). Apparently I could not get scale free topology, and identifying any big module, which I think is because of the small number of genes. I looked up a lot but could not find anyone asking about using WGCNA on small gene sets. (yes most datasets are now big data..)

Do you think WGCNA is still applicable for such a small datasets? I am planning to use power = 10 and detect any modules with size 5 genes in the module. But any suggestion on parameter adjustment or the usage of the package in this situation?

I really appreciate your time and any suggestions!!!

WGCNA • 2.2k views

ADD COMMENT • link updated 7.2 years ago by Peter Langfelder ★ 3.0k • written 7.2 years ago by zson3366 ▴ 10

score 0 · Answer 1 · 2017-01-30

I don't think that a network following the scale free topology can be found in 30 genes. Simply there isn't enough features to show the characteristics of such networks. You could use the hard thresholding (it is not explained in the tutorials but is documented well) method.

However, be aware you should correct for that highly heterogeneous samples, the more similar the data is, the better it will reflect the biology behind it with WGCNA. You should consider splitting the dataset by each group/hospital/method... in order to get more homogeneous groups, and then use the consensus method. Note that using multiData structure, requires some changes to build the networks.

score 0 · Answer 2 · 2017-01-30

Indeed, with a small number of genes you are not likely to construct a scale-free network, but the heterogeneity probably plays a large part as well. I suggest you read through the WGCNA FAQ, especially points 2, 5, and 6 (but other parts of the FAQ can be helpful as well). Use a soft thresholding power from the table in point 6. Doing a consensus analysis is one way to deal with a heterogeneous data set, but there are others as well (point 5).

On a more general note, for 30 genes I would first focus more on suitable visualization than trying to find modules. A simple heatmap with clustered genes and samples organized either by clustering or by external information (the groups that make the samples so heterogenous) may already tell you a lot and be easier to interpret than WGCNA modules and their eigengenes. Only if that does not provide all the needed information would I try WGCNA.