Block weights in scran::pairwiseTTests
1
0
Entering edit mode
@angelos-armen-21507
Last seen 16 months ago
United Kingdom

Hi Aaron,

In the code of scran::pairwiseTTests (current version on Github), you have this commented out:

In theory, a better weighting scheme would be to use the estimated standard error of the log-fold change to compute the weight. This would be more responsive to differences in variance between blocking levels, focusing on levels with low variance and high power. However, this is not safe in practice as genes with many zeroes can have very low standard errors, dominating the results inappropriately.

If I understand correctly, are you referring to a situation like in the example below? Here I have a single gene, two blocks, and two groups. The sample size is equal across blocks and groups. The fold change is the same in the two blocks, but the first block has more zeroes. Weighting by the reciprocal of the squared standard error would indeed result in the first block dominating the result.

library(metapod)

# equal sample size across groups and blocks
n <- 100

# create counts with n_1 ones and n - n_1 zeroes
create_counts <- function(n_1) c(rep(1, n_1), rep(0, n - n_1))

# create block 1 counts
x_1 <- create_counts(1)
y_1 <- create_counts(2)

# create block 2 counts
x_2 <- create_counts(10)
y_2 <- create_counts(20)

# run t-tests per block
out_1 <- t.test(x_1, y_1)
out_2 <- t.test(x_2, y_2)

# get p-values and weights
p_values <- list(out_1$p.value, out_2$p.value)
p_values
weights <- c(1/out_1$stderr^2, 1/out_2$stderr^2)
weights


[[1]] [1] 0.5631136

[[2]] [1] 0.04807781

[1] 3355.932 396.000

# combine p-values
combineParallelPValues(p_values, method = "stouffer", weights = weights)$p.value # combine p-values without weights (as sample sizes are equal) combineParallelPValues(p_values, method = "stouffer")$p.value


[1] 0.4851627 [1] 0.1436335

scran • 525 views
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 7 hours ago
The city by the bay

Yes, especially with the large number of zeroes that deflate the variance in low-coverage blocks. This makes things unpredictable and unintuitive - for example, the precision weighting favors low-coverage blocks with few cells where you might be lucky enough to get an all-zero group that drives the pooled variance towards zero.

0
Entering edit mode

Thank you very much Aaron.