I'd like to apply deseq2 to breast cancer RNAseq expression data to compare metastasis vs non-metastasis patient groups. I have 880 samples in the non-metastasis and only 20 samples in the metastasis group. I was searching for if such sample size differences would make sense to use deseq2 (or any other differentially expressed gene analysis) however could not find many resources to justify my study.
I only came across few biostar and bioconductor messages questioning, for example, use of 15vs3 samples. In general, as far as I understood, Deseq2 works okay with unbalanced sample size, but would it be true for a 20 vs 880 sample comparison case?
I also did a PubMed search, as far as I can see there are not any studies tackling such a problem.
thank you in advance
Michael, thank you so much for your quick reply.