DESeq2, geometric means using posCounts
1
1
Entering edit mode
Phil ▴ 10
@820837e1
Last seen 20 months ago
United States

I'm trying to get a better idea of what happens when calling estimateSizeFactors with a type of "poscounts". I assumed from the help description that Geometric mean would still be used after the modified zero-handling. However, I found the source code below that I think is calculating the Arithmetic mean after zero-handling. I may have looked at the wrong source code however. Can you please verify that Geometric mean is still used?

On a related note, when using the default type of "ratio" instead - what library is the rowMeans () function coming from?

Found this in DESeq2/methods.R in function estimateSizeFactorsForMatrix:

if (type == "poscounts") {
geoMeanNZ <- function(x) {
if (all(x == 0)) { 0 } else { exp( sum(log(x[x > 0])) / length(x) ) }
}
geoMeans <- apply(counts(object), 1, geoMeanNZ)


From help: The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts.

Thank you, Phillipe Loher

DESeq2 • 738 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Why do you think it's computing the arithmetic mean? There's this, right in the middle there sum(log(x[x > 0])), right?

And rowMeans is a base function, which you can find out by looking at the help page. From ?rowMeans

colSums                  package:base                  R Documentation

Form Row and Column Sums and Means

Description:

Form row and column sums and means for numeric arrays (or data
frames).

Usage:


Or just by looking at the function itself

> rowMeans
function (x, na.rm = FALSE, dims = 1L)
{
if (is.data.frame(x))
x <- as.matrix(x)
if (!is.array(x) || length(dn <- dim(x)) < 2L)
stop("'x' must be an array of at least two dimensions")
if (dims < 1L || dims > length(dn) - 1L)
stop("invalid 'dims'")
p <- prod(dn[-(id <- seq_len(dims))])
dn <- dn[id]
z <- if (is.complex(x))
.Internal(rowMeans(Re(x), prod(dn), p, na.rm)) + (0+1i) *
.Internal(rowMeans(Im(x), prod(dn), p, na.rm))
else .Internal(rowMeans(x, prod(dn), p, na.rm))
if (length(dn) > 1L) {
dim(z) <- dn
dimnames(z) <- dimnames(x)[id]
}
else names(z) <- dimnames(x)[[1L]]
z
}
<bytecode: 0x00000000198f7f20>
<environment: namespace:base>


Note the last line.

0
Entering edit mode

Thank you for the fast reply. Yes you are indeed correct, I forgot you can replace the nth-root calculations by logging first. Thanks for the sanity check.