Say I have three samples with a simple design formula ~condition:
sampleA - control
sampleB - treatment (replicate 1)
sampleC - treatment (replicate 2)
DESeq2 returns one fold-change for control vs. treatment, but it is possible to consider separately control vs. treatment rep 1 and control vs. treatment rep 2.
For the NB GLM for say geneX, there are three samples (A,B,C) for read counts Kij.
Of course if geneX had extremely low counts in replicate 1 but not in replicate 2, we would expect strong variance of LFC estimates from replicate 1, so they shouldn't be considered equally.
It's not clear to me how DESeq2 is reporting one value for this comparison? How are replicates 1 and 2 being combined? Would the same explanation apply if we extended this example to say 5 control samples and 5 treatment samples?
A related question is which samples are used for the dispersions estimates? I read that the design formula is used to estimate the dispersions, which I don't entirely understand. I did see in the paper that the dispersion shrinkage decreases as the sample size increases. How else is the design formula affecting the dispersion estimation?