Question

DEseq2: working with unbalanced number of sample in tumor study?

0

Entering edit mode

Cihat • 0

@Cihat-24724

Last seen 4.8 years ago

I'd like to apply deseq2 to breast cancer RNAseq expression data to compare metastasis vs non-metastasis patient groups. I have 880 samples in the non-metastasis and only 20 samples in the metastasis group. I was searching for if such sample size differences would make sense to use deseq2 (or any other differentially expressed gene analysis) however could not find many resources to justify my study.

I only came across few biostar and bioconductor messages questioning, for example, use of 15vs3 samples. In general, as far as I understood, Deseq2 works okay with unbalanced sample size, but would it be true for a 20 vs 880 sample comparison case?

I also did a PubMed search, as far as I can see there are not any studies tackling such a problem.

thank you in advance

Deseq2 RNASeqRData unbalancedsamplesize • 1.6k views

ADD COMMENT • link updated 3.5 years ago by Michael Love 43k • written 4.8 years ago by Cihat • 0

score 4 · Accepted Answer · 2021-02-05

4

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

There is no problem with the balance, but I would tend to use limma-voom for analyses with 100s of bulk RNA-seq samples, as it is much faster. I like to use DESeq2 for its Bayesian moderation of fold change in particular, but that is not relevant with sample size this high.

ADD COMMENT • link 4.8 years ago Michael Love 43k

0

Entering edit mode

Michael, thank you so much for your quick reply.

ADD REPLY • link 4.8 years ago Cihat • 0

0

Entering edit mode

Hi, I have a similar problem in my analysis: 35 vs 800 samples. I found several posts where you have commented that "There is no problem with the balance" for DESeq2. I found this figure on comparison of 3 vs 3 and 2 vs 3 samples in one of your replies. but do you have a literature reference supporting your statement in case of highly imbalanced datasets? Thank you in advance.

ADD REPLY • link 3.5 years ago sadiksha.adhikari • 0

0

Entering edit mode

It's just that there is no breakdown point for linear models with imbalanced data. The estimates are not biased, although you lose efficiency (power).

ADD REPLY • link 3.5 years ago Michael Love 43k