Question

Problem fetching resources from AnnotationHub, error 403

0

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 15 months ago

Italy

Dear all,

I'm currently struggling to fetch data from AnnotationHub:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2015-08-26
> query(ah, c("Homo sapiens", "release-81"))
AnnotationHub with 7 records
# snapshotDate(): 2015-08-26
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: FaFile, GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH47963"]]'

            title                                 
  AH47963 | Homo_sapiens.GRCh38.81.gtf            
  AH49183 | Homo_sapiens.GRCh38.cdna.all.fa       
  AH49184 | Homo_sapiens.GRCh38.dna_rm.toplevel.fa
  AH49185 | Homo_sapiens.GRCh38.dna_sm.toplevel.fa
  AH49186 | Homo_sapiens.GRCh38.dna.toplevel.fa   
  AH49187 | Homo_sapiens.GRCh38.ncrna.fa          
  AH49188 | Homo_sapiens.GRCh38.pep.all.fa        

> Dna <- ah[["AH49186"]]
downloading from ‘https://annotationhub.bioconductor.org/fetch/55651’
    ‘https://annotationhub.bioconductor.org/fetch/55652’
retrieving 2 resources
Downloading: 240 B     
Downloading: 240 B     
Error: failed to load 'AnnotationHub' resource
  name: AH49186
  title: Homo_sapiens.GRCh38.dna.toplevel.fa
  reason: 2 resources failed to download
In addition: There were 50 or more warnings (use warnings() to see the first 50)

and some lines from the warnings:

31: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
32: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
33: download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/55651’
  cache path: ‘/Users/jo/~/.AnnotationHub/55651’
  reason: client error: (403) Forbidden
34: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean

My R:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin15.0.0/x86_64 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] Rsamtools_1.22.0     Biostrings_2.38.0    XVector_0.10.0      
[4] GenomicRanges_1.22.0 GenomeInfoDb_1.6.1   IRanges_2.4.1       
[7] S4Vectors_0.8.0      AnnotationHub_2.2.1  BiocGenerics_0.16.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1                  AnnotationDbi_1.32.0        
 [3] magrittr_1.5                 zlibbioc_1.16.0             
 [5] BiocParallel_1.4.0           xtable_1.8-0                
 [7] R6_2.1.1                     stringr_1.0.0               
 [9] httr_1.0.0                   tools_3.2.2                 
[11] Biobase_2.30.0               DBI_0.3.1                   
[13] lambda.r_1.1.7               futile.logger_1.4.1         
[15] htmltools_0.2.6              digest_0.6.8                
[17] interactiveDisplayBase_1.8.0 shiny_0.12.2                
[19] futile.options_1.0.0         bitops_1.0-6                
[21] curl_0.9.3                   RSQLite_1.0.0               
[23] mime_0.4                     stringi_1.0-1               
[25] BiocInstaller_1.20.0         httpuv_1.3.3

Could it be that the servers are down or not accessible?

Thanks, jo

AnnotationHub • 2.3k views

ADD COMMENT • link updated 10.2 years ago by Valerie Obenchain ★ 6.8k • written 10.3 years ago by Johannes Rainer ★ 2.1k

0

Entering edit mode

Somehow this has to do with the genome fasta files of this Ensembl release, as I can fetch the gtf and can also fetch the dna.toplevel.fa file for Ensembl 80. Are these files (I mean the genome fasta files for Ensembl 81) eventually corrupt?

ADD REPLY • link 10.3 years ago Johannes Rainer ★ 2.1k

1

Entering edit mode

Yes, some of the ensemble 81 fasta files were not uploaded successfully; this will be addressed. I wonder if these are actually different from the Ensembl 80 files?

We've also been working on representing these differently, as 2bit files for more robust, compressed manipulation. Any thoughts?

ADD REPLY • link 10.3 years ago Martin Morgan 25k

0

Entering edit mode

Great, thanks!

Indeed, it might be that they are the same as the one in Ensembl 80... haven't checked.

ADD REPLY • link 10.3 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

by the way, are there plans to include more recent Ensembl releases too?

ADD REPLY • link 10.3 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Yes, we had hoped to stay current more-or-less immediately, but the hiccup above and other issues have distracted us.

ADD REPLY • link 10.3 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin,

My understanding is that, for a given organism, the FASTA file changes only when the reference genome build changes. So at each new Ensembl release, the FASTA files for those organisms for which Ensembl uses a new genome build will change. Some organisms have a very stable reference genome (new build every 4 or 5 years only) but others don't (new build every year or more).

H.

ADD REPLY • link 10.3 years ago Hervé Pagès 16k

score 1 · Accepted Answer · 2015-11-14

1

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 4.1 years ago

United States

Hi,

Thanks for reporting this. The problem was that metadata for all records were inserted in the db but not all data files were pushed to their final (S3 bucket) location. This has been fixed and all ensembl fasta 81 files should now be available in release and devel.

Valerie

ADD COMMENT • link 10.2 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

that rocks! thanks!

cheers, jo

ADD REPLY • link 10.2 years ago Johannes Rainer ★ 2.1k