DESeq2, geometric means using posCounts
1
1
Entering edit mode
Phil ▴ 10
@820837e1
Last seen 2.8 years ago
United States

I'm trying to get a better idea of what happens when calling estimateSizeFactors with a type of "poscounts". I assumed from the help description that Geometric mean would still be used after the modified zero-handling. However, I found the source code below that I think is calculating the Arithmetic mean after zero-handling. I may have looked at the wrong source code however. Can you please verify that Geometric mean is still used?

On a related note, when using the default type of "ratio" instead - what library is the rowMeans () function coming from?

Found this in DESeq2/methods.R in function estimateSizeFactorsForMatrix:

if (type == "poscounts") {
  geoMeanNZ <- function(x) {
    if (all(x == 0)) { 0 } else { exp( sum(log(x[x > 0])) / length(x) ) }
  }
  geoMeans <- apply(counts(object), 1, geoMeanNZ)

From help: The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts.

Thank you, Phillipe Loher

DESeq2 • 1.2k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 16 hours ago
United States

Why do you think it's computing the arithmetic mean? There's this, right in the middle there sum(log(x[x > 0])), right?

And rowMeans is a base function, which you can find out by looking at the help page. From ?rowMeans

colSums                  package:base                  R Documentation

Form Row and Column Sums and Means

Description:

     Form row and column sums and means for numeric arrays (or data
     frames).

Usage:

Or just by looking at the function itself

> rowMeans
function (x, na.rm = FALSE, dims = 1L) 
{
    if (is.data.frame(x)) 
        x <- as.matrix(x)
    if (!is.array(x) || length(dn <- dim(x)) < 2L) 
        stop("'x' must be an array of at least two dimensions")
    if (dims < 1L || dims > length(dn) - 1L) 
        stop("invalid 'dims'")
    p <- prod(dn[-(id <- seq_len(dims))])
    dn <- dn[id]
    z <- if (is.complex(x)) 
        .Internal(rowMeans(Re(x), prod(dn), p, na.rm)) + (0+1i) * 
            .Internal(rowMeans(Im(x), prod(dn), p, na.rm))
    else .Internal(rowMeans(x, prod(dn), p, na.rm))
    if (length(dn) > 1L) {
        dim(z) <- dn
        dimnames(z) <- dimnames(x)[id]
    }
    else names(z) <- dimnames(x)[[1L]]
    z
}
<bytecode: 0x00000000198f7f20>
<environment: namespace:base>

Note the last line.

ADD COMMENT
0
Entering edit mode

Thank you for the fast reply. Yes you are indeed correct, I forgot you can replace the nth-root calculations by logging first. Thanks for the sanity check.

ADD REPLY

Login before adding your answer.

Traffic: 432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6