When calculating the dispersion DESeq2 will only give one dispersion value for each gene, as can also be seen in the dispersion plot. I don't have much knowledge about statistics, but as far as I know we basically want to compare a distribution from condition 1 vs condition 2 and see whether there is a significant difference, as depicted here (Imgur link). So I would expect that there would be two dispersion calculation one for each condition (in a two condition experiment). I could imagine that one would say that this could be generalized over the conditions but why would this be the case? Let's say the expression of gene X is low in condition 1 then it could have high variablity, whereas gene X may be highly expressed in condition 2 and having low variability. Then combining these into one dispersion esitimate would probably underestimate the dispersion for condition 1 and overestimate if for condition2. Note again I have little knowledge about statistics, thus what am I missing here?

The core issue is that if you want to estimate more parameters (such as additional dispersions for each group), you need more data. DESeq2 shares dispersion information between genes to make up for the fact that the data for a single gene is already insufficient to yield a stable estimate of even a single dispersion parameter. Now you are suggesting estimating additional dispersion parameters when the data barely supports estimating the first one. Replacing one decently-estimated dispersion with 2 poorly-estimated dispersions is not going to increase your statistical power, even if it reduces the bias in the dispersion estimation.

The bottom line is that you can't just add additional parameters to a model for free. You need to have enough data to estimate those additional parameters. If you had hundreds of RNA-seq samples in each group, then it might make sense to implement your own test which estimates the dispersion separately for each group.

Aaah I see. Although I don't get the statistics after the dispersion estimates (currently looking at the paper) I could imagine that the model would be like y = b0 + condition2 *b1, where b0 = mean(condition1) and b1 indicates the increase in mean for condition2. And then test with H0: b1 == 0. However how does such a model account for dispersion? Can you explain this in plain english?

DESeq2 is a pretty standard GLM but with a hierarchical model for dispersion. It might help to read some background on GLM first (lots of free courses online). The question you have is more about how GLMs work than DESeq2 in particular.

Aaah I see. Although I don't get the statistics after the dispersion estimates (currently looking at the paper) I could imagine that the model would be like y = b0 + condition2 *b1, where b0 = mean(condition1) and b1 indicates the increase in mean for condition2. And then test with H0: b1 == 0. However how does such a model account for dispersion? Can you explain this in plain english?

DESeq2 is a pretty standard GLM but with a hierarchical model for dispersion. It might help to read some background on GLM first (lots of free courses online). The question you have is more about how GLMs work than DESeq2 in particular.