Hi....
During my coursera course, Bioconductor for Genomic Data Science , I've found out the below issue which seems to be a bug.
I can't download UCSC 'refGene'.
1. Below is the error message.
> ah <- AnnotationHub()
> ah <- subset(ah, species == "Homo sapiens")
> qhs <- query(ah, "RefSeq")
> qhs
AnnotationHub with 8 records
# snapshotDate(): 2015-08-26
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
# sourcetype
# retrieve records with, e.g., 'object[["AH5040"]]'
title
AH5040 | RefSeq Genes
AH5041 | Other RefSeq
AH5155 | RefSeq Genes
AH5156 | Other RefSeq
AH5306 | RefSeq Genes
AH5307 | Other RefSeq
AH5431 | RefSeq Genes
AH5432 | Other RefSeq
> genes <- qhs[qhs$genome == "hg19" & qhs$title == "RefSeq Genes"]
> genes
AnnotationHub with 1 record
# snapshotDate(): 2015-08-26
# names(): AH5040
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: RefSeq Genes
# $description: GRanges object from UCSC track 'RefSeq Genes'
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: UCSC track
# $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/re
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: refGene, UCSC, track, Gene, Transcript, Annotation
# retrieve record with 'object[["AH5040"]]'
> genes <- qhs[[1]]
Error in value[[3L]](cond) :
failed to load hub resource ‘RefSeq Genes’ of class GRanges; reason: bad
restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘5040’ has magic number '<meta'
Use of save versions prior to 2 is deprecated
> genes = qhs[[2]]
retrieving 1 resources
|==========================================================================================| 100%
There were 50 or more warnings (use warnings() to see the first 50)
> genes
UCSC track 'xenoRefGene'
UCSCData object with 161800 ranges and 5 metadata columns:
2. Below is the sessionInfo() for reference.
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.18.4 AnnotationHub_2.0.3 rtracklayer_1.28.10 GenomicRanges_1.20.6
[5] GenomeInfoDb_1.4.2 IRanges_2.2.7 S4Vectors_0.6.5 BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 AnnotationDbi_1.30.1 XVector_0.8.0
[4] magrittr_1.5 zlibbioc_1.14.0 GenomicAlignments_1.4.1
[7] BiocParallel_1.2.21 xtable_1.7-4 R6_2.1.1
[10] stringr_1.0.0 httr_1.0.0 tools_3.2.1
[13] Biobase_2.28.0 DBI_0.3.1 lambda.r_1.1.7
[16] futile.logger_1.4.1 htmltools_0.2.6 digest_0.6.8
[19] interactiveDisplayBase_1.6.0 shiny_0.12.2 futile.options_1.0.0
[22] bitops_1.0-6 curl_0.9.3 RCurl_1.95-4.7
[25] mime_0.4 RSQLite_1.0.0 stringi_0.5-5
[28] Biostrings_2.36.4 Rsamtools_1.20.4 XML_3.98-1.3
[31] httpuv_1.3.3
Smiles
Works for me with a virtually identical sessionInfo(). Maybe your download got corrupted somehow. Try closing your R session and removing your AnnotationHub cache directory (the directory pointed to by
hubCache(ah)
). Then try it again.A subtler approach is to remove the cached file
See
?"cache<-"
. The database itself can be removed withfile.remove(dbfile(qhs))
.Thanks Martin.
There seems to be a lot of download problems with AnnotationHub. All reports seems to be using Windows.
Best,
Kasper
Here is a good session from a user. I'm not sure I understand whether it is the sqlite database which is corrupt or the local cache:
Please see the results below of removing the database (I get the same behavior as before) and the sessionInfo() below that
OK I think the problem here is that httr maintains a cache of connections, and the connection to AnnotationHub has become stale. I think the workaround is
httr::handle_reset(paste0(hubUrl(), "/")); file.remove(dbfile(qhs))
.Hi Dan
Thank you for your help .
I retried after deleting the AnnotationHub cache directory which hubCache(ah) indicates.
I've got below error message.
>
> ah <- AnnotationHub()
retrieving 1 resources
|====================================================================================| 100%
There were 50 or more warnings (use warnings() to see the first 50)
> ah <- subset(ah, species == "Homo sapiens")
> qhs <- query(ah, "RefSeq")
> qhs
AnnotationHub with 8 records
# snapshotDate(): 2015-08-26
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH5040"]]'
title
AH5040 | RefSeq Genes
AH5041 | Other RefSeq
AH5155 | RefSeq Genes
AH5156 | Other RefSeq
AH5306 | RefSeq Genes
AH5307 | Other RefSeq
AH5431 | RefSeq Genes
AH5432 | Other RefSeq
> refseq <- qhs[qhs$genome == "hg19" & qhs$title == "RefSeq Genes"]
> refseq
AnnotationHub with 1 record
# snapshotDate(): 2015-08-26
# names(): AH5040
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: RefSeq Genes
# $description: GRanges object from UCSC track 'RefSeq Genes'
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: UCSC track
# $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: refGene, UCSC, track, Gene, Transcript, Annotation
# retrieve record with 'object[["AH5040"]]'
> refseq <- refseq[[1]]
retrieving 1 resources
Downloading: 73 B
Error in value[[3L]](cond) :
failed to load hub resource ‘RefSeq Genes’ of class GRanges; reason: bad restore
file magic number (file may be corrupted) -- no data loaded
In addition: There were 38 warnings (use warnings() to see them)
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationHub_2.0.3 rtracklayer_1.28.10 GenomicRanges_1.20.6 GenomeInfoDb_1.4.2
[5] IRanges_2.2.7 S4Vectors_0.6.5 BiocGenerics_0.14.0 BiocInstaller_1.18.4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 AnnotationDbi_1.30.1 XVector_0.8.0
[4] magrittr_1.5 zlibbioc_1.14.0 GenomicAlignments_1.4.1
[7] BiocParallel_1.2.21 xtable_1.7-4 R6_2.1.1
[10] stringr_1.0.0 httr_1.0.0 tools_3.2.1
[13] Biobase_2.28.0 DBI_0.3.1 lambda.r_1.1.7
[16] futile.logger_1.4.1 htmltools_0.2.6 digest_0.6.8
[19] interactiveDisplayBase_1.6.0 shiny_0.12.2 futile.options_1.0.0
[22] bitops_1.0-6 curl_0.9.3 RCurl_1.95-4.7
[25] mime_0.4 RSQLite_1.0.0 stringi_0.5-5
[28] Biostrings_2.36.4 Rsamtools_1.20.4 XML_3.98-1.3
[31] httpuv_1.3.3
Smiles
We have still not been able to reproduce these problems. I've tried on a Windows 7 VM.
One thought that occurs is that maybe your disk is full? I know that it seems the problem is happening on multiple machines (all running windows) so that is not likely to be the explanation, but it should be ruled out. The file you are trying to download is larger than 73B.
If that is not the case, then there is something weird with the download and we will continue to investigate.