Question

DESeq2 with unbalanced experimental design

0

Entering edit mode

celine • 0

@celine-7449

Last seen 7.0 years ago

European Union

Dear all,

I am used to analyse RNA-seq data with the very useful and well-documented DESeq2 package. I have analysed an RNA-seq dataset containing 2 conditions (control and transgenic mice) with 3 replicates for the control condition and only 2 replicates for the transgenic one (we initially sequenced 3 transgenic samples but the quality of one of the sample was not sufficient and we therefore have to exclude this sample from the analysis). We submitted a manuscript containing these analyses but one of the reviewer wrote that “the RNA-seq performed is a 2 against 3 experiment and therefore the statistical analysis applied is not valid“.

As in the DESeq2 Genome Biology article: « experimental design with as little as two or three replicates are common and reasonable » I think this is valid to use DESeq2 with this number of replicates. Moreover as the pasilla dataset used in the DESeq2 vignette contains different number of replicates for each condition I also assume that this is valid to use DESeq2 on an unbalanced experimental design.

I am aware that the power of the analysis would have been better with more replicates per condition and a balanced experimental design, but I just want to have a confirmation that applying DESeq2 on such an experimental design is valid.

Thank you in advance for your answer.

Best regards,

Céline

deseq2 • 7.1k views

ADD COMMENT • link updated 14 months ago by Satoshi • 0 • written 10.8 years ago by celine • 0

0

Entering edit mode

Thank you very much for your quick and precise answer.

ADD REPLY • link 10.8 years ago celine • 0

score 1 · Answer 1 · 2015-03-10

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 days ago

United States

That's strange. Yes, the whole point of information sharing across genes in DESeq, edgeR, limma and others, is to allow for statistical inference when sample sizes are small. And yes, DESeq2 methods (the generalized linear model) are valid when the groups are not balanced. You can just reply with a link to one of our figures showing the sensitivity for a 3 vs 3 comparison. 2 vs 3 will have slightly reduced sensitivity, but I'm not sure what other statistical analysis this person has in mind which would have higher sensitivity than methods which share information about variance estimates across genes.

ADD COMMENT • link 10.8 years ago • updated 4.2 years ago Michael Love 43k

0

Entering edit mode

hi,mike Now,I met the same question,the reviewer wrote that “the RNA-seq performed is a 2 against 3 experiment and therefore the statistical analysis applied is not valid“. I find that your reply the url of the figures is lost,you can give me again? Thank you very much.

ADD REPLY • link 4.2 years ago 1227405668 • 0

0

Entering edit mode

Fixed the link

ADD REPLY • link 4.2 years ago Michael Love 43k

0

Entering edit mode

Hey Mike, thanks for the response. We also had a similar issue with 4 vs 8 samples across two conditions. This is single-cell data and I am pseudo-bulking the samples based on the recommendations of https://www.nature.com/articles/s41467-021-25960-2. What would you suggest here? I can think of 3 possible ways to do this: 1) Do pseudobulking and use DESeq2 comparing 4 vs 8 samples. 2) Do a single-cell DEseq2 comparison using batch as a covariate? and 3) Do a rank-sum test across cells with bootstrapping to estimate the error as you have done previously in one of your publications? Any feedback would be appreciated.

ADD REPLY • link 14 months ago Satoshi • 0

0

Entering edit mode

This is a 10 year old thread, would you mind creating a full new post with details about your setup?

ADD REPLY • link 14 months ago Michael Love 43k

0

Entering edit mode

I created one here: Robust way of dealing with low number of samples for Differential Gene Expression. Thanks!

ADD REPLY • link 14 months ago Satoshi • 0