I'm trying to use IHW package to achieve covariate weighted p-values for about 35 million hypothesis testings, but the function has been running for 5 hours and still hasn't returned or thrown an error. I'm wondering if 35 million is stretching its scale and what would be the maximal amount of hypothesis it could reasonably handle. Besides, are there any way to speed it up?
For such large problems one can get large speed savings by doing computations on a subset of all p-values [for example, p-values >= 0.01 are very unlikely to be rejected for a problem of such scale, but at the same time, working with p-values <=0.01 effectively reduces the number of p-values by 2 orders of magnitude]. However, this subsetting has to be done carefully. Please read Section 4 in the IHW vignette ("Advanced usage: Working with incomplete p-value lists") for an explanation of how this may be achieved.
In addition, you can randomly sample your hypotheses with various sample sizes, measure run time using e.g. system.time or the microbenchmark package, plot time against number of hypotheses, see that the relationship is linear, measure the slope, and predict the run time for your problem size.
Nikos' solution is of course more elegant.