Question

Differential Abundance Analysis wih DESeq2 and metabolites as continuous predictor variables

0

Entering edit mode

greenm11 • 0

@greenm11-24180

Last seen 3.6 years ago

Hi all,

I'm currently working with 16S data from an experiment involving two different strains of mice, at 4 different time points between P17 and P84. I'm attempting to analyze the differentially abundant taxa between genotypes at each postnatal age sampled. In addition, I'm hoping to make use of some previously acquired metabolite data to extract some differentially abundant taxa using SCFA levels as a continuous predictor variable. I have a small sample size due to the pilot nature of the study, amounting to 12 fecal and 12 cecal samples for each sampling group. I've worked through my data using both ALDEx2 and DESeq pipelines, although I am uncertain that my analyses are optimized for the experimental questions I want to answer.

My metabolite data is derived from NMR spectroscopy, so all of the values are relative intensities ranging from 0-1. I'm concerned that this is confounding my results, given the log fold change values that DESeq outputs are per unit of a continuous predictor.

I've provided an example of my code for the fecal samples from the P28 timepoint below, looking at differential taxa in relation to butyrate levels. This has been repeated for all timepoints independently after subsetting data by age.

#dat_pr is an un-normalized sequence count table of ASVs for each sample 
dat_pr_fecal_28_ap = subset_samples(dat_pr_fecal_clean_met, Age == 28)
dds_fecal_28 = phyloseq_to_deseq2(dat_pr_fecal_28_ap, ~butyrate_levels)
dds_fecal_28 = DESeq(dds_fecal_28)

res_fecal_28 = results(dds_fecal_28, name = 'butyrate_levels', independentFiltering = FALSE)
res_fecal_28

res_df_fecal_28 = data.frame(res_fecal_28)
res_df_fecal_28 = (res_df_fecal_28
          %>% rownames_to_column('ASV'))
head(res_df_fecal_28)
res_df_fecal_28
``

` The output of this code not only returns no significantly differentially abundant taxa (no adjusted p-values < 0.01), but the volcano plot reveals some odd behavior of the -log p-values that seem to plateau out at a certain ceiling below this threshold (see here)

Please let me know if my model is constructed correctly, and if there is anything I'm missing that may be impinging upon my results!

Thanks so much in advance :)

deseq2 microbiome R • 1.6k views

ADD COMMENT • link updated 3.6 years ago by Michael Love 41k • written 3.6 years ago by greenm11 • 0

0

Entering edit mode

Cross-posted: https://www.biostars.org/p/462018/

ADD REPLY • link 3.6 years ago Kevin Blighe ★ 3.9k

score 0 · Answer 1 · 2020-09-18

"I'm concerned that this is confounding my results, given the log fold change values that DESeq2 outputs are per unit of a continuous predictor."

There isn't a problem here, DESeq2 can handle continuous predictors without issue.

Repeated values in the adjusted p-values is a property of the BH procedure. I've posted an explanation previously.