Entering edit mode
Dear all,
I'm currently struggling to fetch data from AnnotationHub:
> library(AnnotationHub) > ah <- AnnotationHub() snapshotDate(): 2015-08-26 > query(ah, c("Homo sapiens", "release-81")) AnnotationHub with 7 records # snapshotDate(): 2015-08-26 # $dataprovider: Ensembl # $species: Homo sapiens # $rdataclass: FaFile, GRanges # additional mcols(): taxonomyid, genome, description, tags, sourceurl, # sourcetype # retrieve records with, e.g., 'object[["AH47963"]]' title AH47963 | Homo_sapiens.GRCh38.81.gtf AH49183 | Homo_sapiens.GRCh38.cdna.all.fa AH49184 | Homo_sapiens.GRCh38.dna_rm.toplevel.fa AH49185 | Homo_sapiens.GRCh38.dna_sm.toplevel.fa AH49186 | Homo_sapiens.GRCh38.dna.toplevel.fa AH49187 | Homo_sapiens.GRCh38.ncrna.fa AH49188 | Homo_sapiens.GRCh38.pep.all.fa > Dna <- ah[["AH49186"]] downloading from ‘https://annotationhub.bioconductor.org/fetch/55651’ ‘https://annotationhub.bioconductor.org/fetch/55652’ retrieving 2 resources Downloading: 240 B Downloading: 240 B Error: failed to load 'AnnotationHub' resource name: AH49186 title: Homo_sapiens.GRCh38.dna.toplevel.fa reason: 2 resources failed to download In addition: There were 50 or more warnings (use warnings() to see the first 50)
and some lines from the warnings:
31: In curl::curl_fetch_disk(url, x$path, handle = handle) : progress callback must return boolean 32: In curl::curl_fetch_disk(url, x$path, handle = handle) : progress callback must return boolean 33: download failed hub path: ‘https://annotationhub.bioconductor.org/fetch/55651’ cache path: ‘/Users/jo/~/.AnnotationHub/55651’ reason: client error: (403) Forbidden 34: In curl::curl_fetch_disk(url, x$path, handle = handle) : progress callback must return boolean
My R:
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin15.0.0/x86_64 (64-bit) Running under: OS X 10.11.2 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] Rsamtools_1.22.0 Biostrings_2.38.0 XVector_0.10.0 [4] GenomicRanges_1.22.0 GenomeInfoDb_1.6.1 IRanges_2.4.1 [7] S4Vectors_0.8.0 AnnotationHub_2.2.1 BiocGenerics_0.16.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.1 AnnotationDbi_1.32.0 [3] magrittr_1.5 zlibbioc_1.16.0 [5] BiocParallel_1.4.0 xtable_1.8-0 [7] R6_2.1.1 stringr_1.0.0 [9] httr_1.0.0 tools_3.2.2 [11] Biobase_2.30.0 DBI_0.3.1 [13] lambda.r_1.1.7 futile.logger_1.4.1 [15] htmltools_0.2.6 digest_0.6.8 [17] interactiveDisplayBase_1.8.0 shiny_0.12.2 [19] futile.options_1.0.0 bitops_1.0-6 [21] curl_0.9.3 RSQLite_1.0.0 [23] mime_0.4 stringi_1.0-1 [25] BiocInstaller_1.20.0 httpuv_1.3.3
Could it be that the servers are down or not accessible?
Thanks, jo
Somehow this has to do with the genome fasta files of this Ensembl release, as I can fetch the gtf and can also fetch the dna.toplevel.fa file for Ensembl 80. Are these files (I mean the genome fasta files for Ensembl 81) eventually corrupt?
Yes, some of the ensemble 81 fasta files were not uploaded successfully; this will be addressed. I wonder if these are actually different from the Ensembl 80 files?
We've also been working on representing these differently, as 2bit files for more robust, compressed manipulation. Any thoughts?
Great, thanks!
Indeed, it might be that they are the same as the one in Ensembl 80... haven't checked.
by the way, are there plans to include more recent Ensembl releases too?
Yes, we had hoped to stay current more-or-less immediately, but the hiccup above and other issues have distracted us.
Hi Martin,
My understanding is that, for a given organism, the FASTA file changes only when the reference genome build changes. So at each new Ensembl release, the FASTA files for those organisms for which Ensembl uses a new genome build will change. Some organisms have a very stable reference genome (new build every 4 or 5 years only) but others don't (new build every year or more).
H.