Question: converting extremely sparse count dataframe to continuous distributions for study in WGCNA
0
2.7 years ago by
chrisclarkson10030 wrote:

I am very inexperienced with mathematics and expression data.

I have developed a pipeline for WGCNA, practicing with gene-expression microarray data. I am now determined to try to apply this strategy to microbial-communities count data.

Initially I tried finding an adjacency matrix with the natural count-data:

adjacency(df)


And this indeed produced a set of plots- however certain WGCNA commands won't work such as 'pickSoftThreshold' won't recommend a 'powerEstimate', returning the following Warning repeated many times:

Warning in eval(expr, envir, enclos) :
Some correlations are NA in block 1 : 790 .
Warning in as.vector(log10(dk)) : NaNs produced


So I tried using voom to convert it to the continuous dataset. This works but I am doubtful of voom's output:

voom(df, plot-T)


Contrasting this plot with that of a typical plot from the 'voom' paper (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29) indicates that this output is not valid- given that the data is so sparse.

How can I convert such a sparse count data frame to a validly continuous one?

The following post is related to this one but I did not understand most of the terms that were being used: voom for spectral counts

voom counts limma voom • 591 views
modified 2.7 years ago by Gordon Smyth37k • written 2.7 years ago by chrisclarkson10030
Answer: converting extremely sparse count dataframe to continuous distributions for stud
2
2.7 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

voom is designed for RNA-seq data. There is no reason to think it would work well for microbial counts, and this has little to do with sparseness.

voom is also incompatible with WGCNA, because WGCNA can't use the voom weights (which are the whole point of voom).

If you think that you can adapt voom or other RNA-seq methods to microbial counts, then this is your own statistical research project and your own responsibility. It is not something that the limma authors can advise you on. If you aren't a research statistician, then you might consider starting with what statisticians are already doing for microbial data, for example: