AnnotationHub timeout for Ensembl Mouse GTF
0
0
Entering edit mode
dmgatti ▴ 10
@dmgatti-22559
Last seen 5.0 years ago

I've been trying to download the Ensembl 93 mouse GTF for two days, but I keep getting a timeout error. I have also tried Ensembl 91 and 92 and I get the same error. I'm able to download older GTFs (i.e. version 79). Does anyone have any insight into what's going on? Code and sessionInfo() below.

> library(AnnotationHub)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply,
    parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval,
    evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: BiocFileCache
Loading required package: dbplyr

> hub = AnnotationHub()
snapshotDate(): 2019-10-29

> ensembl = hub[["AH63799"]]
downloading 1 resources
retrieving 1 resource
Error: failed to load resource
  name: AH63799
  title: Mus_musculus.GRCm38.93.gtf
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://annotationhub.bioconductor.org/fetch/70545’
  local file path: ‘/home/dmgatti/.cache/AnnotationHub/2b971f67ff73_70545’
  reason: Timeout was reached: [ftp.ensembl.org] Connection time-out 
2: bfcadd() failed; resource removed
  rid: BFC10
  fpath: ‘https://annotationhub.bioconductor.org/fetch/70545’
  reason: download failed 
3: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/70545’
  cache resource: ‘AH63799 : 70545’
  reason: bfcadd() failed; see warnings() 

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.18.0 BiocFileCache_1.10.2 dbplyr_1.4.2         BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3                    later_1.0.0                   pillar_1.4.2                 
 [4] compiler_3.6.1                BiocManager_1.30.10           tools_3.6.1                  
 [7] zeallot_0.1.0                 digest_0.6.23                 bit_1.1-14                   
[10] RSQLite_2.1.4                 memoise_1.1.0                 tibble_2.1.3                 
[13] pkgconfig_2.0.3               rlang_0.4.2                   shiny_1.4.0                  
[16] DBI_1.0.0                     rstudioapi_0.10               curl_4.3                     
[19] yaml_2.2.0                    fastmap_1.0.1                 dplyr_0.8.3                  
[22] httr_1.4.1                    IRanges_2.20.1                vctrs_0.2.0                  
[25] S4Vectors_0.24.1              rappdirs_0.3.1                stats4_3.6.1                 
[28] bit64_0.9-7                   tidyselect_0.2.5              Biobase_2.46.0               
[31] glue_1.3.1                    R6_2.4.1                      AnnotationDbi_1.48.0         
[34] purrr_0.3.3                   blob_1.2.0                    magrittr_1.5                 
[37] promises_1.1.0                backports_1.1.5               htmltools_0.4.0              
[40] assertthat_0.2.1              xtable_1.8-4                  mime_0.7                     
[43] interactiveDisplayBase_1.24.0 httpuv_1.5.2                  crayon_1.3.4                 
[46] BiocVersion_3.10.1 
AnnotationHub Ensembl Mouse AnnotationHub Ensembl • 2.6k views
ADD COMMENT
0
Entering edit mode

Depends for what you need the GTF - if you're looking for Ensembl-based annotations you could try to load one of the EnsDb databases - I'm building them for all species and all recent Ensembl releases:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2019-10-29
> query(ah, "EnsDb.Mmusculus")
AnnotationHub with 12 records
# snapshotDate(): 2019-10-29 
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53222"]]' 

            title                            
  AH53222 | Ensembl 87 EnsDb for Mus Musculus
  AH53726 | Ensembl 88 EnsDb for Mus Musculus
  ...       ...                              
  AH73905 | Ensembl 97 EnsDb for Mus musculus
  AH75036 | Ensembl 98 EnsDb for Mus musculus
> query(ah, "EnsDb.Mmusculus.v93")
AnnotationHub with 1 record
# snapshotDate(): 2019-10-29 
# names(): AH64461
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2018-08-20
# $title: Ensembl 93 EnsDb for Mus Musculus
# $description: Gene and protein annotations for Mus Musculus based on Ensem...
# $taxonomyid: 10090
# $genome: GRCm38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("EnsDb", "Ensembl", "Gene", "Transcript", "Protein",
#   "Annotation", "93", "AHEnsDbs") 
# retrieve record with 'object[["AH64461"]]' 
> edb <- ah[["AH64461"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
require(“ensembldb”)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.4
|Creation time: Tue Dec 11 15:02:39 2018
|ensembl_version: 93
|ensembl_host: localhost
|Organism: Mus musculus
|taxonomy_id: 10090
|genome_build: GRCm38
|DBSCHEMAVERSION: 2.1
| No. of genes: 55029.
| No. of transcripts: 138532.
|Protein data available.
ADD REPLY
1
Entering edit mode

I'm happy to try the EnsDb databases. I just wanted to use what I've been using in my pipeline. I mostly use the GRanges object to do overlaps with other features. I'll see if this works. Thanks for pointing it out.

However, the cause of the AnnotationHub timeouts would be good to know if anyone else has any insight...

ADD REPLY
0
Entering edit mode

Do you have success downloading the file, e.g., via your browser, from https://annotationhub.bioconductor.org/fetch/70545 ? If that 'works', can you try to download this resource the way AnnotationHub does, via

url = "https://annotationhub.bioconductor.org/fetch/70545"
response = httr::GET(url, httr::write_disk(tempfile()), httr::progress(con = stderr()))
ADD REPLY

Login before adding your answer.

Traffic: 388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6