Command

Question

IHW scale of problem

1

Entering edit mode

xiaotong.yao23 ▴ 10

@xiaotongyao23-14986

Last seen 4.9 years ago

Hi,

I'm trying to use IHW package to achieve covariate weighted p-values for about 35 million hypothesis testings, but the function has been running for 5 hours and still hasn't returned or thrown an error. I'm wondering if 35 million is stretching its scale and what would be the maximal amount of hypothesis it could reasonably handle. Besides, are there any way to speed it up?

Thanks, Xiaotong

Command

pval.nb.ihw.refd = ihw(rand.p ~ ref.d, data = pval.nb, alpha = 0.05)

nrow(pval.nb) [1] 353793300

ihw p-value multiple hypothesis • 892 views

ADD COMMENT • link updated 4.8 years ago by Wolfgang Huber ★ 13k • written 4.9 years ago by xiaotong.yao23 ▴ 10

score 0 · Answer 1 · 2019-06-11

Hi Xiaotong,

For such large problems one can get large speed savings by doing computations on a subset of all p-values [for example, p-values >= 0.01 are very unlikely to be rejected for a problem of such scale, but at the same time, working with p-values <=0.01 effectively reduces the number of p-values by 2 orders of magnitude]. However, this subsetting has to be done carefully. Please read Section 4 in the IHW vignette ("Advanced usage: Working with incomplete p-value lists") for an explanation of how this may be achieved.

Hope this helps! Nikos

score 0 · Answer 2 · 2019-06-15

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 10 days ago

EMBL European Molecular Biology Laborat…

In addition, you can randomly sample your hypotheses with various sample sizes, measure run time using e.g. system.time or the microbenchmark package, plot time against number of hypotheses, see that the relationship is linear, measure the slope, and predict the run time for your problem size. Nikos' solution is of course more elegant.

ADD COMMENT • link 4.8 years ago Wolfgang Huber ★ 13k