humann2 is a comprehensive pipeline for metagenome profiling. The method is similar to standard eukaryotic transcriptomics approaches (map reads to genome/genes), but humann2 calculates gene family abundances as weighted sums of the alignments from each read, normalized by gene length and alignment quality.
There's a lot of posts stating that EdgeR and DESeq2 should (usually) only be used on raw counts, but that is in the context of standard transcriptomics workflows, not metagenomics.
In the context of metagenomics, would it be appropriate to use the humann2 output for differential abundance analysis with EdgeR or DESeq2? What if the data was just normalized by alignment quality but not gene length?
I would like to use humann2 if possible, but not if it limits downstream analysis options such as DESeq2 & EdgeR.
Hello!
I find myself in a situation very similar to yours. I have conducted a metatranscriptomic analysis of a set of RNA samples using the humann2 tool. My primary objective is to determine whether the administration of a treatment affects the gene expression of the microbiome in my samples. Consequently, I aim to perform a differential expression analysis in R. However, as you are aware, humann2 provides normalized data (RPK or CPM), rendering it unsuitable for utilization with standard analysis tools such as EdgeR or DESeq2. Could you please guide me on which tool I can employ to conduct the analysis using normalized data? Is there a specific option within humann2 that offers raw counts? Given my objectives, should I contemplate using an alternative tool instead of humann2?
Thank you very much!