Can someone help me to understand the dispersion calculation of DESeq2?
I don't understand what does DESeq2 :
1) If we consider two conditions A and B with n samples in each conditions, are the informations shared between all samples from both the two conditions ? That means are the mean normalized counts calculated from counts of cond. A (22) and cond. B (25) so µ=23,5 ? And the dispersion is calculated according to µ values of all genes?
But in this case my understanding is that differentially expressed genes will not be necessarily more dispersed that genes very expressed but not differentially expressed between cond. A and B. Is that true ?
An example :
cond. A cond. B µ
gene 1 22 25 23.5
gene 2 572 620 646
gene 3 40 700 370
In this case, if the "common" mean normalized counts for all genes is around 100, gene 3 will be less dispersed than gene 2 but more differentially expressed (and gene 3 will not be an outlier surrounded in blue on the dispersion plot but it dispersion wil be shrunked by MAP procedure).
2) Or does DESeq2 calculates the mean normalized counts for cond. A (µA) and cond. B (µB) ? And then it calculates dispersions for each condition and evaluate difference between dispersion of cond. A and cond B.? I think this proposal is much more similar to a test of differential expression than to the dispersion calculation.
Is a proposal true between the two ? Am I misunderstanding something ? What are the mistakes in my explanation ?
Any comments and help on understanding this would be greatly appreciated, Thank you, E.