IHW scale of problem
2
1
Entering edit mode
@xiaotongyao23-14986
Last seen 2.4 years ago

Hi,

I'm trying to use IHW package to achieve covariate weighted p-values for about 35 million hypothesis testings, but the function has been running for 5 hours and still hasn't returned or thrown an error. I'm wondering if 35 million is stretching its scale and what would be the maximal amount of hypothesis it could reasonably handle. Besides, are there any way to speed it up?

Thanks, Xiaotong

Command

pval.nb.ihw.refd = ihw(rand.p ~ ref.d, data = pval.nb, alpha = 0.05)

nrow(pval.nb) [1] 353793300

ihw p-value multiple hypothesis • 294 views
ADD COMMENT
0
Entering edit mode
@nikos-ignatiadis-8823
Last seen 2.4 years ago
Heidelberg

Hi Xiaotong,

For such large problems one can get large speed savings by doing computations on a subset of all p-values [for example, p-values >= 0.01 are very unlikely to be rejected for a problem of such scale, but at the same time, working with p-values <=0.01 effectively reduces the number of p-values by 2 orders of magnitude]. However, this subsetting has to be done carefully. Please read Section 4 in the IHW vignette ("Advanced usage: Working with incomplete p-value lists") for an explanation of how this may be achieved.

Hope this helps! Nikos

ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 11 weeks ago
EMBL European Molecular Biology Laborat…

In addition, you can randomly sample your hypotheses with various sample sizes, measure run time using e.g. system.time or the microbenchmark package, plot time against number of hypotheses, see that the relationship is linear, measure the slope, and predict the run time for your problem size. Nikos' solution is of course more elegant.

ADD COMMENT

Login before adding your answer.

Traffic: 295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6