Not able to download recount3 data
4
1
Entering edit mode
Niveditha ▴ 20
@fab15350
Last seen 2.7 years ago
India

Hi,

I'm using the following code to fetch data from recount3

## Find all available human projects
human_projects <- available_projects()

# Error generated
Error in file(file, "rt") : invalid 'description' argument
In addition: Warning messages:
1: The 'url' <http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/sra.recount_project.MD.gz> does not exist or is not available. 
2: The 'url' <http://duffel.rail.bio/recount3/human/data_sources/gtex/metadata/gtex.recount_project.MD.gz> does not exist or is not available. 
3: The 'url' <http://duffel.rail.bio/recount3/human/data_sources/tcga/metadata/tcga.recount_project.MD.gz> does not exist or is not available.

sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C               LC_TIME=en_IN.UTF-8        LC_COLLATE=en_IN.UTF-8     LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8    LC_PAPER=en_IN.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_IN.UTF-8
[12] LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] psych_2.2.9                 recount_1.22.0              recount3_1.6.0              tximport_1.24.0             KEGG.db_1.0                 rms_6.3-0                   SparseM_1.81                Hmisc_4.7-1                 Formula_1.2-4               survival_3.4-0              lattice_0.20-45            
[12] annotate_1.74.0             XML_3.99-0.11               ggrepel_0.9.1               wordcloud_2.6               RColorBrewer_1.1-3          enrichR_3.1                 SummarizedBenchmark_2.14.0  sessioninfo_1.2.2           digest_0.6.30               mclust_5.4.10               rlang_1.0.6                
[23] UpSetR_1.4.0                org.Hs.eg.db_3.15.0         AnnotationDbi_1.58.0        edgeR_3.38.4                limma_3.52.4                sva_3.44.0                  BiocParallel_1.30.4         genefilter_1.78.0           mgcv_1.8-40                 nlme_3.1-159                dendextend_1.16.0          
[34] reshape2_1.4.4              gridExtra_2.3               ggExtra_0.10.0              forcats_0.5.2               purrr_0.3.5                 readr_2.1.3                 tidyr_1.2.1                 tibble_3.1.8                tidyverse_1.3.2             ggvenn_0.1.9                ggplot2_3.3.6              
[45] dplyr_1.0.10                DBI_1.1.3                   rlist_0.4.6.2               crayon_1.5.2                stringr_1.4.1               yaml_2.3.6                  DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1        matrixStats_0.62.0         
[56] GenomicRanges_1.48.0        GenomeInfoDb_1.32.4         IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0         data.table_1.14.4          

loaded via a namespace (and not attached):
  [1] utf8_1.2.2               R.utils_2.12.0           tidyselect_1.2.0         RSQLite_2.2.18           htmlwidgets_1.5.4        munsell_0.5.0            codetools_0.2-18         rentrez_1.2.3            interp_1.1-3             miniUI_0.1.1.1           withr_2.5.0              colorspace_2.0-3        
 [13] filelock_1.0.2           knitr_1.40               rstudioapi_0.14          GenomeInfoDbData_1.2.8   mnormt_2.1.1             bit64_4.0.5              downloader_0.4           vctrs_0.5.0              generics_0.1.3           TH.data_1.1-1            xfun_0.34                BiocFileCache_2.4.0     
 [25] R6_2.5.1                 locfit_1.5-9.6           bitops_1.0-7             cachem_1.0.6             DelayedArray_0.22.0      assertthat_0.2.1         promises_1.2.0.1         BiocIO_1.6.0             scales_1.2.1             multcomp_1.4-20          nnet_7.3-17              derfinder_1.30.0        
 [37] googlesheets4_1.0.1      gtable_0.3.1             sandwich_3.0-2           MatrixModels_0.5-1       splines_4.2.0            rtracklayer_1.56.1       gargle_1.2.1             GEOquery_2.64.2          broom_1.0.1              checkmate_2.1.0          modelr_0.1.9             GenomicFeatures_1.48.4  
 [49] backports_1.4.1          httpuv_1.6.6             qvalue_2.28.0            tools_4.2.0              ellipsis_0.3.2           Rcpp_1.0.9               plyr_1.8.7               progress_1.2.2           base64enc_0.1-3          zlibbioc_1.42.0          RCurl_1.98-1.9           prettyunits_1.1.1       
 [61] rpart_4.1.16             deldir_1.0-6             viridis_0.6.2            bumphunter_1.38.0        GenomicFiles_1.32.1      zoo_1.8-11               haven_2.5.1              cluster_2.1.4            fs_1.5.2                 magrittr_2.0.3           reprex_2.0.2             googledrive_2.0.0       
 [73] mvtnorm_1.1-3            hms_1.1.2                mime_0.12                xtable_1.8-4             jpeg_0.1-9               readxl_1.4.1             compiler_4.2.0           biomaRt_2.52.0           R.oo_1.25.0              htmltools_0.5.3          later_1.3.0              tzdb_0.3.0              
 [85] geneplotter_1.74.0       lubridate_1.8.0          dbplyr_2.2.1             MASS_7.3-58.1            rappdirs_0.3.3           Matrix_1.5-1             cli_3.4.1                R.methodsS3_1.8.2        derfinderHelper_1.30.0   parallel_4.2.0           pkgconfig_2.0.3          GenomicAlignments_1.32.1
 [97] foreign_0.8-82           foreach_1.5.2            xml2_1.3.3               rngtools_1.5.2           XVector_0.36.0           rvest_1.0.3              doRNG_1.8.2              VariantAnnotation_1.42.1 Biostrings_2.64.1        cellranger_1.1.0         htmlTable_2.4.1          restfulr_0.0.15         
[109] curl_4.3.3               shiny_1.7.3              Rsamtools_2.12.0         quantreg_5.94            rjson_0.2.21             lifecycle_1.0.3          jsonlite_1.8.3           BSgenome_1.64.0          viridisLite_0.4.1        fansi_1.0.3              pillar_1.8.1             KEGGREST_1.36.3         
[121] fastmap_1.1.0            httr_1.4.4               glue_1.6.2               iterators_1.0.14         png_0.1-7                bit_4.0.4                stringi_1.7.8            blob_1.2.3               polspline_1.1.20         latticeExtra_0.6-30      memoise_2.0.1

The code used to work previously but has been giving an error of late.

recount3 • 2.6k views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 3 months ago
United States

Hi,

What happens if you try to access https://duffel.rail.bio/recount3/human/homes_index on your browser using the same WiFi / internet connection from where you couldn't download the files?

If it doesn't work, what happens if you run this on your terminal:

curl -v https://duffel.rail.bio/recount3/human/homes_index

It's possible that you have being blocked by IT @ JHU, but we'll see. See https://github.com/LieberInstitute/recount3/issues/29 for more details.

Best, Leo

ADD COMMENT
0
Entering edit mode
Niveditha ▴ 20
@fab15350
Last seen 2.7 years ago
India

Thank you, Leo for getting back on this.

I'm able to access https://duffel.rail.bio/recount3/human/homes_index from my google-chrome web browser (snapshot below). The URL gets redirected to https://sciserver.org/public-data/recount3/data/human/homes_index.

Results from web browser

Please find below the logs for curl -v https://duffel.rail.bio/recount3/human/homes_index

*   Trying 104.16.243.78:443...
* TCP_NODELAY set
*   Trying 2606:4700::6810:f44e:443...
* TCP_NODELAY set
* Immediate connect fail for 2606:4700::6810:f44e: Network is unreachable
*   Trying 2606:4700::6810:f34e:443...
* TCP_NODELAY set
* Immediate connect fail for 2606:4700::6810:f34e: Network is unreachable
* Connected to duffel.rail.bio (104.16.243.78) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /home/niveditha/softwares/anaconda3/ssl/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=duffel.rail.bio
*  start date: Oct 20 00:00:00 2022 GMT
*  expire date: Oct 20 23:59:59 2023 GMT
*  subjectAltName: host "duffel.rail.bio" matched cert's "duffel.rail.bio"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
> GET /recount3/human/homes_index HTTP/1.1
> Host: duffel.rail.bio
> User-Agent: curl/7.68.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< Date: Fri, 23 Dec 2022 04:53:40 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< location: https://sciserver.org/public-data/recount3/data/human/homes_index
< x-do-app-origin: 607188ed-e69e-11ec-b1dc-0c42a19a82a7
< cache-control: private
< x-do-orig-status: 302
< CF-Cache-Status: MISS
< Server: cloudflare
< CF-RAY: 77de7eacbdd4f323-BOM
< 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
* Connection #0 to host duffel.rail.bio left intact
<p>You should be redirected automatically to target URL: <a href="https://sciserver.org/public-data/recount3/data/human/homes_index">https://sciserver.org/public-data/recount3/data/human/homes_index</a>.  If not click the link.
ADD COMMENT
0
Entering edit mode

Additionally, I ran the following in R:

curl("https://sciserver.org/public-data/recount3/data/human/homes_index")

A connection with                                                                               
description "https://sciserver.org/public-data/recount3/data/human/homes_index"
class       "curl"                                                             
mode        "r"                                                                
text        "text"                                                             
opened      "closed"                                                           
can read    "yes"                                                              
can write   "no"
open(con)

Error in open.connection(con) : error:1414D172:SSL routines:tls12_check_peer_sigalg:wrong signature type

traceback()
2: open.connection(con)
1: open(con)

sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C               LC_TIME=en_IN.UTF-8        LC_COLLATE=en_IN.UTF-8     LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8    LC_PAPER=en_IN.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_IN.UTF-8
[12] LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] psych_2.2.9                 recount_1.22.0              recount3_1.6.0              tximport_1.24.0             KEGG.db_1.0                 rms_6.3-0                   SparseM_1.81                Hmisc_4.7-1                 Formula_1.2-4               survival_3.4-0              lattice_0.20-45            
[12] annotate_1.74.0             XML_3.99-0.11               ggrepel_0.9.1               wordcloud_2.6               RColorBrewer_1.1-3          enrichR_3.1                 SummarizedBenchmark_2.14.0  sessioninfo_1.2.2           digest_0.6.30               mclust_5.4.10               rlang_1.0.6                
[23] UpSetR_1.4.0                org.Hs.eg.db_3.15.0         AnnotationDbi_1.58.0        edgeR_3.38.4                limma_3.52.4                sva_3.44.0                  BiocParallel_1.30.4         genefilter_1.78.0           mgcv_1.8-40                 nlme_3.1-159                dendextend_1.16.0          
[34] reshape2_1.4.4              gridExtra_2.3               ggExtra_0.10.0              forcats_0.5.2               purrr_0.3.5                 readr_2.1.3                 tidyr_1.2.1                 tibble_3.1.8                tidyverse_1.3.2             ggvenn_0.1.9                ggplot2_3.3.6              
[45] dplyr_1.0.10                DBI_1.1.3                   rlist_0.4.6.2               crayon_1.5.2                stringr_1.4.1               yaml_2.3.6                  DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1        matrixStats_0.62.0         
[56] GenomicRanges_1.48.0        GenomeInfoDb_1.32.4         IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0         data.table_1.14.4          

loaded via a namespace (and not attached):
  [1] utf8_1.2.2               R.utils_2.12.0           tidyselect_1.2.0         RSQLite_2.2.18           htmlwidgets_1.5.4        munsell_0.5.0            codetools_0.2-18         rentrez_1.2.3            interp_1.1-3             miniUI_0.1.1.1           withr_2.5.0              colorspace_2.0-3        
 [13] filelock_1.0.2           knitr_1.40               rstudioapi_0.14          GenomeInfoDbData_1.2.8   mnormt_2.1.1             bit64_4.0.5              downloader_0.4           vctrs_0.5.0              generics_0.1.3           TH.data_1.1-1            xfun_0.34                BiocFileCache_2.4.0     
 [25] R6_2.5.1                 locfit_1.5-9.6           bitops_1.0-7             cachem_1.0.6             DelayedArray_0.22.0      assertthat_0.2.1         promises_1.2.0.1         BiocIO_1.6.0             scales_1.2.1             multcomp_1.4-20          nnet_7.3-17              derfinder_1.30.0        
 [37] googlesheets4_1.0.1      gtable_0.3.1             sandwich_3.0-2           MatrixModels_0.5-1       splines_4.2.0            rtracklayer_1.56.1       gargle_1.2.1             GEOquery_2.64.2          broom_1.0.1              checkmate_2.1.0          modelr_0.1.9             GenomicFeatures_1.48.4  
 [49] backports_1.4.1          httpuv_1.6.6             qvalue_2.28.0            tools_4.2.0              ellipsis_0.3.2           Rcpp_1.0.9               plyr_1.8.7               progress_1.2.2           base64enc_0.1-3          zlibbioc_1.42.0          RCurl_1.98-1.9           prettyunits_1.1.1       
 [61] rpart_4.1.16             deldir_1.0-6             viridis_0.6.2            bumphunter_1.38.0        GenomicFiles_1.32.1      zoo_1.8-11               haven_2.5.1              cluster_2.1.4            fs_1.5.2                 magrittr_2.0.3           reprex_2.0.2             googledrive_2.0.0       
 [73] mvtnorm_1.1-3            hms_1.1.2                mime_0.12                xtable_1.8-4             jpeg_0.1-9               readxl_1.4.1             compiler_4.2.0           biomaRt_2.52.0           R.oo_1.25.0              htmltools_0.5.3          later_1.3.0              tzdb_0.3.0              
 [85] geneplotter_1.74.0       lubridate_1.8.0          dbplyr_2.2.1             MASS_7.3-58.1            rappdirs_0.3.3           Matrix_1.5-1             cli_3.4.1                R.methodsS3_1.8.2        derfinderHelper_1.30.0   parallel_4.2.0           pkgconfig_2.0.3          GenomicAlignments_1.32.1
 [97] foreign_0.8-82           foreach_1.5.2            xml2_1.3.3               rngtools_1.5.2           XVector_0.36.0           rvest_1.0.3              doRNG_1.8.2              VariantAnnotation_1.42.1 Biostrings_2.64.1        cellranger_1.1.0         htmlTable_2.4.1          restfulr_0.0.15         
[109] curl_4.3.3               shiny_1.7.3              Rsamtools_2.12.0         quantreg_5.94            rjson_0.2.21             lifecycle_1.0.3          jsonlite_1.8.3           BSgenome_1.64.0          viridisLite_0.4.1        fansi_1.0.3              pillar_1.8.1             KEGGREST_1.36.3         
[121] fastmap_1.1.0            httr_1.4.4               glue_1.6.2               iterators_1.0.14         png_0.1-7
ADD REPLY
0
Entering edit mode
@lcolladotor
Last seen 3 months ago
United States

Hi,

Thank you for your interest in recount3 (and recount2).

We've been documenting this issue at https://github.com/LieberInstitute/recount3/issues/29. The IDIES link you were using changed recently to https://sciserver.org/public-data/recount3/data. However, we now have a new (2nd) host thanks to AWS at https://registry.opendata.aws/recount/. This is now the default host used by our load balancer duffel (https://github.com/nellore/digitalocean-duffel). recount3 version 1.9.1 documents these new hosts https://github.com/LieberInstitute/recount3/commit/6cf18f316123695b6a93c2049ab499b00d6c2acf.

Note that you need TLS version 1.2 or newer which most people have. If you encounter any new issues, please let us know at https://github.com/LieberInstitute/recount3/issues.

Please help us share this announcement

Thanks! Leo

ADD COMMENT
0
Entering edit mode

Thank you for the update, Leo. I'll check on this and get back to you in case the issue still persists.

ADD REPLY
0
Entering edit mode

CURL command - curl https://www.sciserver.org/public-data/recount3/data/human/homes_index

CURL command Works in the system with the following version of curl

curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.53.1 zlib/1.2.11 libidn/1.28 libssh2/1.8.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets

The curl command Fails with an error - wrong signature type with the following version of curl

curl 7.83.1 (x86_64-conda-linux-gnu) libcurl/7.83.1 OpenSSL/3.0.7 zlib/1.2.12 libssh2/1.10.0 nghttp2/1.47.0
Release-Date: 2022-05-11
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz NTLM NTLM_WB SPNEGO SSL TLS-SRP UnixSockets

Request you to help me with this.

ADD REPLY
0
Entering edit mode

Not sure if this is a issue for which I need to create a ticket in the github link provided in the previous comment. Request you to confirm

ADD REPLY
0
Entering edit mode

Please don't reopen that issue thread, as it will ping lots of people.

Can you try using the newly documented Amazon AWS mirror?

library("recount3")

options(recount3_url = "https://recount-opendata.s3.amazonaws.com/recount3/release")

human_projects <- available_projects()

Best, Leo

ADD REPLY
1
Entering edit mode

Thank you so much, Leo. This works.

ADD REPLY
0
Entering edit mode

Cool =) No problem.

ADD REPLY
0
Entering edit mode
@lcolladotor
Last seen 3 months ago
United States

Hi again,

If you are a Windows user, duffel now fully works on that operating system. That is, the duffel access issue has now been addressed by the internal switch from RCurl::url.exists() to httr::http_error(). You can gain access to these updates by installing recount3 version 1.10.2 (bioc-release aka 3.17) or 1.11.2 (bioc-devel aka 3.18).

duffel currently points to https://registry.opendata.aws/recount/ instead of IDIES.

Best, Leo

ADD COMMENT

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6