The documentation of the DESeq sizeFactors, as described in the supplement to the DEXSeq paper says:
"we multiply them with a suitably chosen constant c ... such that [product of final sizeFactors] = 1. This keeps the size factors close to unity and ensures that normalized count values ... remain close to their original values, making their interpretation, e. g. in plots, easier."
But I find that the product of the sizeFactors generated by DESeq and DESeq2 is not quite 1.
When comparing sizeFactors generated with different methods or with different subsets of genes, it would be nice to have them all conform to the above restriction.
The code below is the relevant source code from DESeq2_1.10.1 It does not seem to have any constant that would make the product be 1. A one-liner of my own code is shown to do the trick.
If the authors have a response, I would appreciate it.
Thanks,
Sol Katzman
UC Santa Cruz
-------------------------------------
DEXSeq reference: "Detecting differential usage of exons from RNA-seq data", Genome Research 2012.
--------------------------------------
R source code from DESeq2:
> estimateSizeFactorsForMatrix
function (counts, locfunc = stats::median, geoMeans, controlGenes)
{
if (missing(geoMeans)) {
loggeomeans <- rowMeans(log(counts))
}
...
sf <- if (missing(controlGenes)) {
apply(counts, 2, function(cnts) {
exp(locfunc((log(cnts) - loggeomeans)[is.finite(loggeomeans) &
cnts > 0]))
})
}
}
----------------------------------------------
My code for making product = 1:
# step3: multiply median-ratios by a constant so that product is 1
# prod(c*medians) = 1
# c^n * prod(medians) = 1
# c^n = 1/prod(medians)
# n*log(c) = - sum (log (medians))
# log(c) = -(1/n) * sum (log (medians))
# c = exp (-1/n * sum (log(medians))
constant <- exp ((-1/length(medianRatios)) * sum (log (medianRatios)))
my.sizeFactors <- constant*medianRatios

