Search
Question: DESeq2 product of sizeFactors is not 1
1
gravatar for katzman
12 months ago by
katzman10
katzman10 wrote:

The documentation of the DESeq sizeFactors, as described in the supplement to the DEXSeq paper says:

"we multiply them with a suitably chosen constant c ... such that [product of final sizeFactors] = 1. This keeps the size factors close to unity and ensures that normalized count values ... remain close to their original values, making their interpretation, e. g. in plots, easier."

But I find that the product of the sizeFactors generated by DESeq and DESeq2 is not quite 1.

When comparing sizeFactors generated with different methods or with different subsets of genes, it would be nice to have them all conform to the above restriction.

The code below is the relevant source code from DESeq2_1.10.1 It does not seem to have any constant that would make the product be 1. A one-liner of my own code is shown to do the trick.

If the authors have a response, I would appreciate it.

Thanks,

Sol Katzman

UC Santa Cruz

-------------------------------------

DEXSeq reference: "Detecting differential usage of exons from RNA-seq data", Genome Research 2012.

--------------------------------------

R source code from DESeq2:

> estimateSizeFactorsForMatrix
function (counts, locfunc = stats::median, geoMeans, controlGenes)
{
    if (missing(geoMeans)) {
        loggeomeans <- rowMeans(log(counts))
    }
...

   sf <- if (missing(controlGenes)) {
        apply(counts, 2, function(cnts) {
            exp(locfunc((log(cnts) - loggeomeans)[is.finite(loggeomeans) &
                cnts > 0]))
        })
    }

}

----------------------------------------------
My code for making product = 1:

# step3: multiply median-ratios by a constant so that product is 1
#        prod(c*medians) = 1
#        c^n * prod(medians) = 1
#        c^n = 1/prod(medians)
#        n*log(c) = - sum (log (medians))
#        log(c)   = -(1/n) * sum (log (medians))
#        c = exp (-1/n * sum (log(medians))
constant <- exp ((-1/length(medianRatios)) * sum (log (medianRatios)))
my.sizeFactors <- constant*medianRatios

 

 

 
ADD COMMENTlink modified 12 months ago by Michael Love11k • written 12 months ago by katzman10
0
gravatar for Michael Love
12 months ago by
Michael Love11k
United States
Michael Love11k wrote:
This is a good question for Simon, to see if it's really important that the product be *exactly* 1. The size factor estimation currently in DESeq1/2 is in my opinion sufficient to accomplish the goal that the mean of normalized counts is on a similar scale to the mean of counts. For code stability I'm disinclined to make any algorithm changes to DESeq2.
ADD COMMENTlink written 12 months ago by Michael Love11k

I agree with Mike that there is essentially no reason for the product to be exactly 1, but I can also see that it is a bit ugly that it isn't. But I guess Mike is right: fixing this might break things, and as it really does not influence the inference, we'll better leave it as is.

BTW, here is a slightly nicer code to make the size factors' geometric mean 1

cds <- estimateSizeFactors( cds )
sizeFactors(cds) <- sizeFactors(cds) / exp(mean(log( sizeFactors(cds) )))

 

 

ADD REPLYlink written 12 months ago by Simon Anders3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 133 users visited in the last hour