Hi,
I posted this question three weeks ago on BioStars, but now I'm posting it here too, hoping for some advice. I'm using DESeq2 to look for differentially expressed genes. I have three groups with different individuals: Treatment A, Treatment B, and Control.
Sample sizes (individuals) differ between the groups: A=9, B=5, Control=5.
If I want to test A directly against B, the sample size difference is of no problem, I understand that (asked already in this question).
But, when I want to test A vs Control and see how that differs from B vs Control, I worry about the unequal sample sizes, because I detect more significant genes in A just because of more accurate dispersion due to larger sample size.
What are the suggestions to deal with this problem? Should I randomly subsample 5 individuals from A (i.e. discard 4) and only use those in the comparison with the Controls? Or should I use all individuals in A but call significance with a stricter FDR threshold? If the latter, how do I know which FDR is the most suitable?
Grateful for any advice.
edit: Since Michael asked for more explicit description of the dataset:
There's treatment A, treatment B, Control (as described above) and at four time points (including day0 (before treatment)). I want to compare A-vs-Ctrl at day0, day1, day2, day3, (and over time with LRT). Then I want to do the same separately with B-vs-Ctrl. Finally I want to see how AvsCtrl differed from BvsCtrl at each time point. I expect these to differ a lot, for example A may have 10 times as many significant genes during day1 as B did.
You will have to tell us more about the biology of the experiment to get a good answer. Hence I start with a comment, or a question:
Why do you think that looking for differences between A-Ctrl and B-Ctrl gives you an answer better fitting your biological question than simply comparing A to B? (What *is* your question exactly?)
Let's say, A and B are some treatments.
Then the contrast A-Ctrl answers the question: For which genes to I have evidence that their expression is affected by to treatment A? And the contrast A-to=B answers: For which genes do I have evidence that their expression is affected differently by B than by A?
What question might your A-Ctrl/B-Ctrl comparison answer?
A and B are two different treatments with different individuals. A vs Ctrl will tell me which genes are different in treatment A, and B vs Ctrl will tell me which genes are different in treatment B. I expect (and do get) quite different results from the two treatments, i.e. A vs Ctrl provides lots of sig genes, whereas not much is happening with B.
Comparing A vs B directly only tells me the difference between the two treatments, not how they compare and differ to the controls, which is the important question. I need to analyse A vs Ctrl separately and B vs Ctrl separately over time, and discuss how the two treatments are different to Ctrls in their responses.
As we had in a recent post on the support site: "Note that (C-A) - (B-A) = C - B."
Sure, but that doesn't make sense in my case since A and B are very different and I have several time points as well. So I need to analyse A vs Ctrl separately over time, and then B vs Ctrl separately over time. These two analyses will be my main results. How A and B compare to the controls. Only in the end, I intend to compare the treatment differences.