Entering edit mode
Calling snpsById
with a vector of 4 rsids returns in about 60 seconds.
But calling snpsById 4 times via lapply returns in fewer than 7 seconds, and combining the results (to, in my case, a data.frame) takes negligible time.
This surprised me.
I realize, though don't understand the detail, that the first invocation of a SNPlocs method takes a long time, a minute or more, perhaps due to loading large amounts of data into memory. For that reason, I call snpsById twice below.
library(SNPlocs.Hsapiens.dbSNP151.GRCh38)
rsids <- c("rs11576415", "rs11584174", "rs12753774", "rs12754503")
t0 <- system.time(x0 <- snpsById(SNPlocs.Hsapiens.dbSNP151.GRCh38, rsids))
t1 <- system.time(x1 <- snpsById(SNPlocs.Hsapiens.dbSNP151.GRCh38, rsids))
t2 <- system.time(x2 <- lapply(rsids,
function(rsid) snpsById(SNPlocs.Hsapiens.dbSNP151.GRCh38, rsid)))
do.call(rbind, lapply(x2, as.data.frame))
# seqnames pos strand RefSNP_id alleles_as_ambig
# 1 1 161212418 * rs11576415 S
# 2 1 161242663 * rs11584174 Y
# 3 1 161271358 * rs12753774 R
# 4 1 161265868 * rs12754503 K
t0
# user system elapsed
# 129.208 9.718 141.967
t1
# user system elapsed
# 58.031 2.573 61.067
t2
# user system elapsed
# 2.954 2.182 5.146
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] SNPlocs.Hsapiens.dbSNP151.GRCh38_0.99.20
[2] BSgenome_1.60.0
[3] rtracklayer_1.52.1
[4] Biostrings_2.60.2
[5] XVector_0.32.0
[6] GenomicRanges_1.44.0
[7] GenomeInfoDb_1.28.4
[8] IRanges_2.26.0
[9] S4Vectors_0.30.0
[10] BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 zlibbioc_1.38.0
[3] GenomicAlignments_1.28.0 BiocParallel_1.26.2
[5] lattice_0.20-44 rjson_0.2.20
[7] tools_4.1.0 grid_4.1.0
[9] SummarizedExperiment_1.22.0 Biobase_2.52.0
[11] matrixStats_0.60.1 yaml_2.2.1
[13] crayon_1.4.1 BiocIO_1.2.0
[15] Matrix_1.3-4 GenomeInfoDbData_1.2.6
[17] restfulr_0.0.13 bitops_1.0-7
[19] RCurl_1.98-1.4 DelayedArray_0.18.0
[21] compiler_4.1.0 MatrixGenerics_1.4.3
[23] Rsamtools_2.8.0 XML_3.99-0.7
Under some circumstances lookup is so slow as to be unusable
Does not complete in 30 min.