Hello, I am currently working with shotgun metagenomic sequencing data from the human gut microbiome, and I aim to construct a co-occurrence network.
Below is the simple workflow that I've been working with,
(1) Taxonomic profiling: Run MetaPhlAn4 with options --ignore_eukaryotes --ignore_archaea -t rel_ab_w_read_stats
(2) Species-level absolute count table quality control: Filter species with minimum relative abundance < 0.001% & prevalence > 5% of total samples.
(3) Infer correlation: Run FastSpar
(4) Edge list/Adjacency matrix quality control: Filter edges with |r| < 0.2 & FDR > 0.05
I have some questions about making a robust co-occurence network with shotgun metagenomic data.
Q1. I guess minimum relative abundance 0.001% and prevalence of 5-20% are the commonly used threshold for absolute count table quality control.
However, still the threshold for relative abundance and prevalence can differ on datasets we use, then how do we conclude that certain threshold is best for our dataset?
Maybe we check the sparsity? Then what is the optimal sparsity of microbiome dataset? Or is there other indices we can use?
Q2. Similar to Q1, how do we choose the optimal threshold for edge filtering?
Q3. At the end, when we finally make a network with quality controlled data, we can calculate various network statistics such as edge density, average degree, modularity, etc. Can these statistics be used as indices for decent network?
For example, I heard that protein-protein network tends to show about 5% of edge density.
Thanks in advance for your answers.
