I posted this question three weeks ago on BioStars, but now I'm posting it here too, hoping for some advice. I'm using DESeq2 to look for differentially expressed genes. I have three groups with different individuals: Treatment A, Treatment B, and Control.
Sample sizes (individuals) differ between the groups: A=9, B=5, Control=5.
If I want to test A directly against B, the sample size difference is of no problem, I understand that (asked already in this question).
But, when I want to test A vs Control and see how that differs from B vs Control, I worry about the unequal sample sizes, because I detect more significant genes in A just because of more accurate dispersion due to larger sample size.
What are the suggestions to deal with this problem? Should I randomly subsample 5 individuals from A (i.e. discard 4) and only use those in the comparison with the Controls? Or should I use all individuals in A but call significance with a stricter FDR threshold? If the latter, how do I know which FDR is the most suitable?
Grateful for any advice.
edit: Since Michael asked for more explicit description of the dataset:
There's treatment A, treatment B, Control (as described above) and at four time points (including day0 (before treatment)). I want to compare A-vs-Ctrl at day0, day1, day2, day3, (and over time with LRT). Then I want to do the same separately with B-vs-Ctrl. Finally I want to see how AvsCtrl differed from BvsCtrl at each time point. I expect these to differ a lot, for example A may have 10 times as many significant genes during day1 as B did.