Question

Differential abundance analysis of 16S data with extremely unbalanced cohorts in Deseq2

0

Entering edit mode

dr.aj.scott • 0

@drajscott-20996

Last seen 6.5 years ago

For full disclosure I posted this question on the Phyloseq Github page but perhaps this forum is more appropriate.

I have a 16S dataset from gut mucosa and want to analyse differential abundance according to a factor. I have 200 samples: 10 cases and 190 controls.

Q1: is it valid to use DESeq2 to compare differential abundance with DESeq2 with such an a large imbalance between the numbers of cases and control? I know that Deseq2 is designed to deal with some imbalance in sample sizes but I'm unclear about whether this applies equally to 16S data as it does to RNAseq data. There is significant inter-individual variation in 16S data that I'm concerned would prevent

Q2: assuming the above is not a valid way to proceed (i.e. comparing 190 controls with 10 cases), how should this analysis be performed? Should I subsample from my controls (while trying to match other factors between cases and controls)? The problem with this approach is that performing comparisons with different subsamples produces different results (probably because of inherently large intersample variability in 16S data). Also, on what basis would you decide subsample control size? 10, 20, 30?

Q3: A further alternative could be to select x number of controls for comparison to cases but then to resample these controls n number of times and try to build a distribution of n fold changes for each taxa between my cases and controls. Is this statistically valid? How could such an approach be applied with DESeq2?

I'd be grateful for any insight anyone might have on this issue. I have researched the question but have not found it discussed anywhere.

Many thanks for your thoughts.

deseq2 • 781 views

ADD COMMENT • link updated 6.5 years ago by Michael Love 43k • written 6.5 years ago by dr.aj.scott • 0

score 0 · Answer 1 · 2019-06-11

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

DESeq2 can handle imbalanced class size in RNA-seq.

As I’ve said on the forum previously, I’m not familiar with 16S data and I’ve become skeptical that it’s the best tool as the data doesn’t always look similar to RNA-seq. I just haven’t had any time to investigate and it’s not my area of expertise.

ADD COMMENT • link 6.5 years ago Michael Love 43k