Block weights in scran::pairwiseTTests
Entering edit mode
Last seen 1 day ago
United Kingdom

Hi Aaron,

In the code of scran::pairwiseTTests (current version on Github), you have this commented out:

In theory, a better weighting scheme would be to use the estimated standard error of the log-fold change to compute the weight. This would be more responsive to differences in variance between blocking levels, focusing on levels with low variance and high power. However, this is not safe in practice as genes with many zeroes can have very low standard errors, dominating the results inappropriately.

If I understand correctly, are you referring to a situation like in the example below? Here I have a single gene, two blocks, and two groups. The sample size is equal across blocks and groups. The fold change is the same in the two blocks, but the first block has more zeroes. Weighting by the reciprocal of the squared standard error would indeed result in the first block dominating the result.


# equal sample size across groups and blocks
n <- 100

# create counts with n_1 ones and n - n_1 zeroes
create_counts <- function(n_1) c(rep(1, n_1), rep(0, n - n_1))

# create block 1 counts
x_1 <- create_counts(1)
y_1 <- create_counts(2)

# create block 2 counts
x_2 <- create_counts(10)
y_2 <- create_counts(20)

# run t-tests per block
out_1 <- t.test(x_1, y_1)
out_2 <- t.test(x_2, y_2)
# get p-values and weights
p_values <- list(out_1$p.value, out_2$p.value)
weights <- c(1/out_1$stderr^2, 1/out_2$stderr^2)

[[1]] [1] 0.5631136

[[2]] [1] 0.04807781

[1] 3355.932 396.000

# combine p-values 
combineParallelPValues(p_values, method = "stouffer", weights = weights)$p.value

# combine p-values without weights (as sample sizes are equal)
combineParallelPValues(p_values, method = "stouffer")$p.value

[1] 0.4851627 [1] 0.1436335

scran • 122 views
Entering edit mode
Aaron Lun ★ 27k
Last seen 12 hours ago
The city by the bay

Yes, especially with the large number of zeroes that deflate the variance in low-coverage blocks. This makes things unpredictable and unintuitive - for example, the precision weighting favors low-coverage blocks with few cells where you might be lucky enough to get an all-zero group that drives the pooled variance towards zero.

Entering edit mode

Thank you very much Aaron.


Login before adding your answer.

Traffic: 228 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6