Question: ExperimentHub wackiness (which breaks packages that depend upon experimentHub-provided *data pkgs)
0
gravatar for Tim Triche
14 months ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:

What does this mean? And why is it nearly impossible to get this to work? I have cleared the cache, taken various suggestions from ExperimentHub/BiocManager, and none of it helps. This... kind of sucks, to be honest.

install("sesameData", localHub=TRUE)

Bioconductor version 3.8 (BiocManager 1.30.1), R 3.5.0 (2018-04-23)

Installing package(s) 'sesameData'

trying URL 'https://bioconductor.org/packages/3.8/data/experiment/src/contrib/sesameData_0.99.3.tar.gz'

Content type 'application/x-gzip' length 1529903 bytes (1.5 MB)
==================================================
downloaded 1.5 MB
* installing *source* package ‘sesameData’ ...
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded

Warning: database may not be current
database: ‘/home/tim.triche//.ExperimentHub/experimenthub.sqlite3’
reason: An unknown option was passed in to libcurl
Error: package or namespace load failed for ‘sesameData’:
.onAttach failed in attachNamespace() for 'sesameData', details:
call: value[[3L]](cond)
error: failed to connect
reason: An unknown option was passed in to libcurl

Consider rerunning with 'localHub=TRUE'
Error: loading failed
Execution halted
ERROR: loading failed

* removing ‘/primary/projects/triche/R-tim/x86_64-pc-linux-gnu-library/3.5/sesameData’

Anything immediately obvious that can be done to fix this? (Ideally the solution will not involve "uninstall and reinstall every piece of software and library on the cluster," which was never required for .db0 or experiment packages, regardless of their other faults)

Students in a recent workshop mentioned that using ExperimentHub for data packages was breaking the examples, and now I can replicate this at a higher level (it breaks a working install).

experimenthub • 409 views
ADD COMMENTlink modified 14 months ago by Martin Morgan ♦♦ 23k • written 14 months ago by Tim Triche4.2k

sessionInfo() output breaks the 5000 character limit (of course...) 

```R

R> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
   
Matrix products: default
BLAS: /primary/vari/software/R/R-3.5.0/lib64/R/lib/libRblas.so
LAPACK: /primary/vari/software/R/R-3.5.0/lib64/R/lib/libRlapack.so
   
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
   
attached base packages:
[1] stats4    parallel  graphics  grDevices datasets  stats     utils
[8] methods   base
   
other attached packages:
 [1] remotes_1.1.1
 [2] minfiData_0.27.0
 [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
 [4] IlluminaHumanMethylation450kmanifest_0.4.0
 [5] FlowSorted.Blood.450k_1.19.0
 [6] minfi_1.27.5
 [7] bumphunter_1.23.0
 [8] locfit_1.5-9.1
 [9] iterators_1.0.10
[10] foreach_1.4.4
[11] Biostrings_2.49.1
[12] XVector_0.21.3
[13] SummarizedExperiment_1.11.6
[14] DelayedArray_0.7.24
[15] BiocParallel_1.15.8
[16] matrixStats_0.54.0
[17] Biobase_2.41.2
[18] GenomicRanges_1.33.13
[19] GenomeInfoDb_1.17.1
[20] IRanges_2.15.16
[21] S4Vectors_0.19.19
[22] BiocGenerics_0.27.1
[23] BiocManager_1.30.1
[24] skeletor_1.0.4
[25] magrittr_1.5
[26] gtools_3.8.1
   
loaded via a namespace (and not attached):
 [1] nlme_3.1-137             bitops_1.0-6             devtools_1.13.6
 [4] bit64_0.9-7              RColorBrewer_1.1-2       progress_1.2.0
 [7] httr_1.3.1               tools_3.5.0              doRNG_1.7.1
[10] nor1mix_1.2-3            R6_2.2.2                 HDF5Array_1.9.7
[13] DBI_1.0.0                withr_2.1.2              tidyselect_0.2.4
[16] prettyunits_1.0.2        base64_2.0               bit_1.1-14
[19] compiler_3.5.0           preprocessCore_1.43.0    xml2_1.2.0
[22] pkgmaker_0.27            rtracklayer_1.41.3       readr_1.1.1
[25] genefilter_1.63.0        quadprog_1.5-5           commonmark_1.5
[28] stringr_1.3.1            digest_0.6.15            Rsamtools_1.33.3
[31] illuminaio_0.23.2        siggenes_1.55.0          GEOquery_2.49.0
[34] pkgconfig_2.0.1          bibtex_0.4.2             limma_3.37.3
[37] rlang_0.2.1              RSQLite_2.1.1            DelayedMatrixStats_1.3.4
[40] bindr_0.1.1              mclust_5.4.1             dplyr_0.7.6
[43] RCurl_1.95-4.11          GenomeInfoDbData_1.1.0   Matrix_1.2-14
[46] Rcpp_0.12.18             Rhdf5lib_1.3.1           stringi_1.2.4
[49] MASS_7.3-49              zlibbioc_1.27.0          rhdf5_2.25.4
[52] plyr_1.8.4               grid_3.5.0               blob_1.1.1
[55] crayon_1.3.4             lattice_0.20-35          splines_3.5.0
# ...character limit!

```

ADD REPLYlink written 14 months ago by Tim Triche4.2k

> install("sesameData", localHub=TRUE) # succeeded with the following sessionInfo, but took a long time at "testing if installed package can be loaded"

> sessionInfo()

R version 3.5.1 Patched (2018-07-21 r74998)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: macOS Sierra 10.12.6

 

Matrix products: default

BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     

 

other attached packages:

[1] BiocManager_1.30.1 rmarkdown_1.10    

 

loaded via a namespace (and not attached):

[1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2

[5] tools_3.5.1     htmltools_0.3.6 Rcpp_0.12.18    stringi_1.2.4

[9] knitr_1.20      stringr_1.3.1   digest_0.6.15   evaluate_0.11

---

Are you sure your installation is up to date?  library(BiocManager); install() should not have to do anything....

ADD REPLYlink written 14 months ago by Vincent J. Carey, Jr.6.3k

FYI this does not look related to the server updates I mentioned in response to this thread: https://stat.ethz.ch/pipermail/bioc-devel/2018-August/013896.html. I don't think my response has posted to bioc-devel yet but I wanted to close the loop here.

Valerie

ADD REPLYlink written 14 months ago by Valerie Obenchain6.7k

A few thoughts -

install("sesameData", localHub=TRUE) is not a valid call. The localHub=TRUE argument is used in the ExperimentHub constructor not in install/install.packages. eh = ExperimentHub(localHub=TRUE). I just tried removing and installing the package install("sesameData") and I cannot reproduce this ERROR. The download of your files from ExperimentHub will require internet access. If there is no or slow internet, it could fail because of TIMEOUTS. The localHub argument is used in these cases; if the files were already download ExperimentHub does not try to use the internet and uses only the files that were already downloaded to the cache location.

You mentioned cluster is there a proxy that may need to be set? Also, if the package was already installed on the cluster there would be no need to re-install the package or have your students do so. You mentioned at the end that it breaks a working install - The install step only needs to be performed once. After the package is installed you shouldn't need to re-install simply load with library(sesameData) - regardless I did also try to re-install after installing and cannot reproduce this error either.

ADD REPLYlink modified 14 months ago • written 14 months ago by shepherl ♦♦ 1.6k
Answer: ExperimentHub wackiness (which breaks packages that depend upon experimentHub-pr
0
gravatar for Martin Morgan
14 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

install("sesameData", localHub=TRUE) seems to be a manufactured command (where did it come from?). I think you're aiming for

BiocManager::install("sesameData")

And what you're trying to to is to install a plain-old experiment data package, nothing to do with ExperimentHub.

ADD COMMENTlink written 14 months ago by Martin Morgan ♦♦ 23k

My guess is the message

Consider rerunning with 'localHub=TRUE'

leads to trying this, even if it's not actually relevant.

ADD REPLYlink written 14 months ago by Mike Smith4.0k

Looking at this, sessameData() has this in .onAttach

> sesameData:::.onAttach
function (libname, pkgname) 
{
    packageStartupMessage("Loading sesameData.")
    if (has_internet()) {
        suppressMessages(log <- capture.output(sesameDataCacheAll()))
    }
}

and sesameDataCacheAll() has this

> sesameData::sesameDataCacheAll
function () 
{
    setExperimentHubOption(arg = "MAX_DOWNLOADS", 30)
    eh <- query(ExperimentHub(), "sesameData")
    cache(eh)
    TRUE
}

which downloads about 445 MB of data when the package is first attached (i.e., when it is installed and tested). Is this really what you want to do? Are all 445 MB of data required for each use of, presumably, the sesame package? Even if so, I think it would be better to move this command into part of the work flow, or simply let the data be loaded from cache / downloaded as the need comes up in the sesame analysis.

Also, I don't see the value of using localHub = TRUE in the sesameData code; this should really be something the user would specify, maybe by exposing an argument to, e.g., sesameDataList(hub = ExperimentHub()) ...

This conversation would be more appropriate for the bioc-devel mailing list.

 

ADD REPLYlink written 14 months ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 143 users visited in the last hour