DESeq2, geometric means using posCounts
Entering edit mode
Phil ▴ 10
Last seen 20 months ago
United States

I'm trying to get a better idea of what happens when calling estimateSizeFactors with a type of "poscounts". I assumed from the help description that Geometric mean would still be used after the modified zero-handling. However, I found the source code below that I think is calculating the Arithmetic mean after zero-handling. I may have looked at the wrong source code however. Can you please verify that Geometric mean is still used?

On a related note, when using the default type of "ratio" instead - what library is the rowMeans () function coming from?

Found this in DESeq2/methods.R in function estimateSizeFactorsForMatrix:

if (type == "poscounts") {
  geoMeanNZ <- function(x) {
    if (all(x == 0)) { 0 } else { exp( sum(log(x[x > 0])) / length(x) ) }
  geoMeans <- apply(counts(object), 1, geoMeanNZ)

From help: The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts.

Thank you, Phillipe Loher

DESeq2 • 738 views
Entering edit mode
Last seen 1 day ago
United States

Why do you think it's computing the arithmetic mean? There's this, right in the middle there sum(log(x[x > 0])), right?

And rowMeans is a base function, which you can find out by looking at the help page. From ?rowMeans

colSums                  package:base                  R Documentation

Form Row and Column Sums and Means


     Form row and column sums and means for numeric arrays (or data


Or just by looking at the function itself

> rowMeans
function (x, na.rm = FALSE, dims = 1L) 
    if ( 
        x <- as.matrix(x)
    if (!is.array(x) || length(dn <- dim(x)) < 2L) 
        stop("'x' must be an array of at least two dimensions")
    if (dims < 1L || dims > length(dn) - 1L) 
        stop("invalid 'dims'")
    p <- prod(dn[-(id <- seq_len(dims))])
    dn <- dn[id]
    z <- if (is.complex(x)) 
        .Internal(rowMeans(Re(x), prod(dn), p, na.rm)) + (0+1i) * 
            .Internal(rowMeans(Im(x), prod(dn), p, na.rm))
    else .Internal(rowMeans(x, prod(dn), p, na.rm))
    if (length(dn) > 1L) {
        dim(z) <- dn
        dimnames(z) <- dimnames(x)[id]
    else names(z) <- dimnames(x)[[1L]]
<bytecode: 0x00000000198f7f20>
<environment: namespace:base>

Note the last line.

Entering edit mode

Thank you for the fast reply. Yes you are indeed correct, I forgot you can replace the nth-root calculations by logging first. Thanks for the sanity check.


Login before adding your answer.

Traffic: 207 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6