I wonder how did you calculate the "baseMean counts" in the output of DESeq2. I looked the Genome Biology paper and additional files, it seems you used mean counts or mean expression there. If you give the "baseMean counts" as an output, it seems you have normalised the data against gene lengths (RPKM), but I didn't input anything related to gene lengths in the analysis with DESeq2. Did you estimate gene lengths or just using a same length for all the genes or should it be "mean CPM counts"?
The base mean is the mean of normalized counts of all samples, normalizing
for sequencing depth.
It does not take into account gene length. The base mean is used in DESeq2
only for estimating the dispersion of a gene (it is used to estimate the
fitted dispersion). For this task, the range of counts for a gene is
relevant but not the gene's length (or other technical factors influencing
the count, like sequence content).
But how is it calculated?
counts(dds, normalized=TRUE) is a matrix of elements: K_ij / s_j (see DESeq2 paper for definitions)
If there were outliers replaced you also need
replaced=TRUE
. The following evaluates to TRUE: