Question

DESeq2, geometric means using posCounts

1

Entering edit mode

Phil ▴ 10

@820837e1

Last seen 2.8 years ago

United States

I'm trying to get a better idea of what happens when calling estimateSizeFactors with a type of "poscounts". I assumed from the help description that Geometric mean would still be used after the modified zero-handling. However, I found the source code below that I think is calculating the Arithmetic mean after zero-handling. I may have looked at the wrong source code however. Can you please verify that Geometric mean is still used?

On a related note, when using the default type of "ratio" instead - what library is the rowMeans () function coming from?

Found this in DESeq2/methods.R in function estimateSizeFactorsForMatrix:

if (type == "poscounts") {
  geoMeanNZ <- function(x) {
    if (all(x == 0)) { 0 } else { exp( sum(log(x[x > 0])) / length(x) ) }
  }
  geoMeans <- apply(counts(object), 1, geoMeanNZ)

From help: The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts.

Thank you, Phillipe Loher

DESeq2 • 1.2k views

ADD COMMENT • link 2.8 years ago Phil ▴ 10

score 2 · Accepted Answer · 2021-07-19

Why do you think it's computing the arithmetic mean? There's this, right in the middle there sum(log(x[x > 0])), right?

And rowMeans is a base function, which you can find out by looking at the help page. From ?rowMeans

colSums                  package:base                  R Documentation

Form Row and Column Sums and Means

Description:

     Form row and column sums and means for numeric arrays (or data
     frames).

Usage:

Or just by looking at the function itself

> rowMeans
function (x, na.rm = FALSE, dims = 1L) 
{
    if (is.data.frame(x)) 
        x <- as.matrix(x)
    if (!is.array(x) || length(dn <- dim(x)) < 2L) 
        stop("'x' must be an array of at least two dimensions")
    if (dims < 1L || dims > length(dn) - 1L) 
        stop("invalid 'dims'")
    p <- prod(dn[-(id <- seq_len(dims))])
    dn <- dn[id]
    z <- if (is.complex(x)) 
        .Internal(rowMeans(Re(x), prod(dn), p, na.rm)) + (0+1i) * 
            .Internal(rowMeans(Im(x), prod(dn), p, na.rm))
    else .Internal(rowMeans(x, prod(dn), p, na.rm))
    if (length(dn) > 1L) {
        dim(z) <- dn
        dimnames(z) <- dimnames(x)[id]
    }
    else names(z) <- dimnames(x)[[1L]]
    z
}
<bytecode: 0x00000000198f7f20>
<environment: namespace:base>

Note the last line.