Question

DESeq2 differences in Wald and LRT in multi-factorial design

0

Entering edit mode

Mir • 0

@67a9a9c7

Last seen 3.4 years ago

Canada

Hi all!

So I'm running a differential abundance analysis where I'm looking at differentially abundant taxa between mice of different genotypes under different early-life stress conditions.

I've set up my design as follows, given I'm interested in the effects of genotype accounting for different levels of stress and the interaction between the two:

dds = phyloseq_to_deseq2(dat_pr_OBIT_f, ~GENOTYPE + Stress + GENOTYPE:Stress)

And I get the following for my resultsNames:

[1] "Intercept"             "GENOTYPE_TCR_vs_B6"    "Stress_Str_vs_NS"     
[4] "GENOTYPETCR.StressStr"

upon further examining the results table for "GENOTYPE_TCR_vs_B6":

res = results(dds, name = 'GENOTYPE_TCR_vs_B6')
res_df = data.frame(res)
res_df = (res_df
          %>% rownames_to_column('ASV'))

I get 204 taxa that meet my significance criteria (padj < 0.01 and log2foldchange > 1 in absolute value.

As far as I understand, this gives me the taxa that are differentially abundant between genotypes accounting for stress conditions and the interaction between the two, i.e. the main effect.

following the same procedure for "Stress_Str_vs_NS" and "GENOTYPETCR.StressStr", I get 109 and 93 significant taxa, respectively.

I assume the first of these gives the main effect of stress accounting for genotype and the interaction.

I have a few clarifications regarding these results.

1) What is the interpretation of log2fold change for taxa that are significantly differentially abundant in the interaction condition (as a result of inserting "GENOTYPETCR.StressStr" into the name argument of the results() function)?

2) Is my interpretation of other results correct (main effects, etc)?

3) I also tried to answer the same question using the LRT:

dds = phyloseq_to_deseq2(dat_pr_OBIT_f, ~GENOTYPE + Stress + GENOTYPE:Stress)

dds_lrt <- DESeq(dds, test="LRT", reduced = ~ Stress)

However this time with the same criteria I get only 46 significant taxa for Stress (switching out Stress for GENOTYPE in the reduced argument), but 282 significant taxa for genotype (with the code above).

Trying to examine the interaction term using either ~Stress or ~GENOTYPE as the reduced model gives different results from the Wald test as well, both or which are different than the results when I isolate the interaction effect (reduced = ~ Stress + GENOTYPE, only 32 significant taxa)

I'm wondering if this LRT approach is "more correct" than the Wald test based on my question, and whether I have the correct procedure for examining main effects and interactions. Given there is no way to put the interaction in the reduced model while maintaining minimum degrees of freedom ( I tried, it throws an error), is the LRT model as a whole still accounting/controlling for interaction when examining main effects?

Thanks so much in advance!

microbiome LRT DESeq2 • 1.3k views

ADD COMMENT • link updated 3.8 years ago by Michael Love 43k • written 3.8 years ago by Mir • 0

score 0 · Answer 1 · 2022-01-23

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Unfortunately, I have to reserve my time on the support site for software related questions. For consultation about statistical design and interpretation of results, I recommend working with a local statistician or someone familiar with linear models in R.

ADD COMMENT • link 3.8 years ago Michael Love 43k