We have single-cells sequencing data from two groups of cells with 100 cells and 3000 cells, respectively. Our interest is to find the gene differentially expressed between these two groups, and we like to try MAST.

We are worrying abou the unbalanced group size, i.e, 100 vs 3000, will it affect the differential analysis, and how? Is there a way to attenuate the effect?

Thanks!

Shao

To followup the question, I have one extreme case with only 3 cells in group 1 and 500 cells in group 2. Am I still on the safe side? How should I estimate the minimal disparity in the sample sizes? I have other cases with 20 - 50 cells in group 1, and 500 cells in group 2. Thank you.

Statistical validity (calibration of p-values) is unaffected by the unbalanced group sizes, but as above, power will be low. Can you clarify what is meant by "minimal disparity in sample sizes?"