Question

phyloseq_to_deseq2 design strategy for multiple factors analysis (2 or more)

0

Entering edit mode

pyveronneau71 • 0

@pyveronneau71-23108

Last seen 5.7 years ago

Hi everyone,

I have several soil amplicon-seq metagenomics data sets (16S, 18S, ITS) and I want to see if some communities are significantly differentially present amongst different treatments. When I compare using a condition with 2 factors (Treatment: A, B), I use the following command:

library(DESeq2) dia=phyloseq_to_deseq2(physeq_object, ~Treatment) dia=DESeq(dia, test="Wald", fitType = "parametric", parallel = F) res=results(dia, cooksCutoff = F)

With this, I get a list a Differentially Expressed Communities. If I replace ~Treatment by ~0 + Treatment, I don't get the same results (I have more DEC with the ~0 + Treatment).

Q1. Which one of the approach is more suitable for my type of analysis?

Also, when I have a condition with more than 2 factors (Regie: treatment1, treatment2, treatment3, treatment4), I use the following command:

library(DESeq2) dia=phyloseq_to_deseq2(physeq_object, ~0 + Regie) dia=DESeq(dia, test="Wald", fitType = "parametric", parallel = F) resultsNames(dia) res=results(dia, contrast=c("Regie", "treatment1", "treatment2"), cooksCutoff = F)

I run the last line while I'm looping through other possible pairwise comparisons.

Q2. Is it the right thing to do if I want to compare multiple factors within the same condition? All the examples I've seen with multiple factors were between conditions too (Condition1: A, B, C and Condition2: D, E, F. ex: A vs D, A vs E, etc...) but me it's always in the same condition.

Q3. There are several options with the DESeq() function. Any of them recommended with metagenomics data? (lost of zero and low counts) I tried some of them (ex: test="LRT", reduced= ~ 1, sfType= "poscount") but I don't know which one is better. Any thoughts on that?

Thanks for your help, it's really appreciated!

Cheers,

PY

deseq2 • 1.0k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by pyveronneau71 • 0

score 0 · Answer 1 · 2020-03-16

I don't have much feedback for DESeq2 for microbiome / metagenomic analyses. I'm not convinced it is always a good model for these datatypes and have done no development of the software to support microbiome / metagenomics. So I do not have any recommendations in particular. To the degree it is similar to certain single cell datasets (which I have profiled in collaboration with the zingerR and ZINB-WaVE authors), we found LRT and poscounts were good options in the presence of data more compatible with zero inflated NB distributions.

Re: replacing ~treatment with ~0 + treatment, these are different model matrices with different interpretations of the coefficients.

You need to discuss all modeling choices with a statistician if you are unsure of the meaning of the coefficients (this is outside the scope of support I can provide).