Hi all!
So I'm running a differential abundance analysis where I'm looking at differentially abundant taxa between mice of different genotypes under different early-life stress conditions.
I've set up my design as follows, given I'm interested in the effects of genotype accounting for different levels of stress and the interaction between the two:
dds = phyloseq_to_deseq2(dat_pr_OBIT_f, ~GENOTYPE + Stress + GENOTYPE:Stress)
And I get the following for my resultsNames:
[1] "Intercept" "GENOTYPE_TCR_vs_B6" "Stress_Str_vs_NS"
[4] "GENOTYPETCR.StressStr"
upon further examining the results table for "GENOTYPE_TCR_vs_B6":
res = results(dds, name = 'GENOTYPE_TCR_vs_B6')
res_df = data.frame(res)
res_df = (res_df
%>% rownames_to_column('ASV'))
I get 204 taxa that meet my significance criteria (padj < 0.01 and log2foldchange > 1 in absolute value.
As far as I understand, this gives me the taxa that are differentially abundant between genotypes accounting for stress conditions and the interaction between the two, i.e. the main effect.
following the same procedure for "Stress_Str_vs_NS" and "GENOTYPETCR.StressStr", I get 109 and 93 significant taxa, respectively.
I assume the first of these gives the main effect of stress accounting for genotype and the interaction.
I have a few clarifications regarding these results.
1) What is the interpretation of log2fold change for taxa that are significantly differentially abundant in the interaction condition (as a result of inserting "GENOTYPETCR.StressStr" into the name argument of the results() function)?
2) Is my interpretation of other results correct (main effects, etc)?
3) I also tried to answer the same question using the LRT:
dds = phyloseq_to_deseq2(dat_pr_OBIT_f, ~GENOTYPE + Stress + GENOTYPE:Stress)
dds_lrt <- DESeq(dds, test="LRT", reduced = ~ Stress)
However this time with the same criteria I get only 46 significant taxa for Stress (switching out Stress for GENOTYPE in the reduced argument), but 282 significant taxa for genotype (with the code above).
Trying to examine the interaction term using either ~Stress or ~GENOTYPE as the reduced model gives different results from the Wald test as well, both or which are different than the results when I isolate the interaction effect (reduced = ~ Stress + GENOTYPE, only 32 significant taxa)
I'm wondering if this LRT approach is "more correct" than the Wald test based on my question, and whether I have the correct procedure for examining main effects and interactions. Given there is no way to put the interaction in the reduced model while maintaining minimum degrees of freedom ( I tried, it throws an error), is the LRT model as a whole still accounting/controlling for interaction when examining main effects?
Thanks so much in advance!