I am surprised that bplapply() takes a few seconds to detect the number of available cores, while multicoreWorkers() or MulticoreParam() are fast when they run by themselves:
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam())) user system elapsed 0.060 0.180 4.033
Multicore setup is fast if number of cores is fixed:
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(1))) user system elapsed 0.036 0.004 0.042
Slow if the choice is delegated to multicoreWorkers():
> system.time(BiocParallel::bplapply(1:1e2 , function(x) order(rnorm(n=1e3)), BPPARAM = MulticoreParam(multicoreWorkers()))) user system elapsed 0.056 0.140 4.034
But by itself, multicoreWorkers() is fast!
> system.time(multicoreWorkers()) user system elapsed 0.000 0.032 0.037
I am running BiocParallel 1.10.0.

Indeed the speed has a lot to do with the session. I tried the
bplapplycommand in a fresh session and it took 0.6 s. Then I loadedGenomicRangesand it took 1.4 s, then I addedSummarizedExperiment,MultiAssayExperimentandrtracklayer, and the elapsed time rose to 2.0, 2.3 and 2.8 s respectively !