AH134 and other AnnotationHub objects unavailable as "defunct"
1
0
Entering edit mode
@robert-k-bradley-5997
Last seen 8 months ago
United States

Hello,

I discovered that several AnnotationHub objects corresponding to genome assemblies are no longer available after updating to R 3.6.0. Specifically:

> hub = AnnotationHub()
snapshotDate(): 2019-04-29
> hub[["AH134"]]
Error: Defunct
> hub[["AH188"]]
Error: Defunct
> hub[["AH47190"]]
Error: Defunct
> hub[["AH80"]]
Error: Defunct

The above objects correspond to the hg19, mm10, danRer10, and dm5 genome assemblies. My lab relies on these assemblies, which are still pretty commonly used. I'm unsure of when these objects became unavailable, but as of R 3.5.3 (and correspond updated versions of Bioconductor packages), I was able to access them via my cached copies. However, after upgrading to R 3.6.0, I am unable to access them even though I have copies in my local cache.

I searched AnnotationHub() to see if I could find equivalents under other names, but was unable to.

Here is my sessionInfo():

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  grDevices utils     datasets  stats     graphics  methods   base     

other attached packages:
[1] AnnotationHub_2.15.15 BiocFileCache_1.7.10  dbplyr_1.4.0          BiocGenerics_0.29.2   readr_1.3.1           dplyr_0.8.0.1         tibble_2.1.1          magrittr_1.5         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1                    pillar_1.3.1                  compiler_3.6.0                BiocManager_1.30.4            later_0.8.0                  
 [6] tools_3.6.0                   digest_0.6.18                 bit_1.1-14                    RSQLite_2.1.1                 memoise_1.1.0                
[11] pkgconfig_2.0.2               rlang_0.3.4                   shiny_1.3.2                   DBI_1.0.0                     curl_3.3                     
[16] yaml_2.2.0                    httr_1.4.0                    IRanges_2.17.5                S4Vectors_0.21.24             rappdirs_0.3.1               
[21] hms_0.4.2                     stats4_3.6.0                  bit64_0.9-7                   tidyselect_0.2.5              Biobase_2.43.1               
[26] glue_1.3.1                    R6_2.4.0                      AnnotationDbi_1.45.1          purrr_0.3.2                   blob_1.1.1                   
[31] promises_1.0.1                htmltools_0.3.6               assertthat_0.2.1              mime_0.6                      interactiveDisplayBase_1.21.0
[36] xtable_1.8-4                  httpuv_1.5.1                  crayon_1.3.4
AnnotationHub • 2.2k views
ADD COMMENT
0
Entering edit mode
shepherl 4.1k
@lshep
Last seen 1 day ago
United States

The above resources were labelled as defunct in the current release do to the changes in Rsamtools. The files could no longer be loaded with recent versions and therefore invalidated.

The information is still included in the hubs but as a different version. Instead of providing the FaFiles, several releases ago we switched over to providing the 2bit and/or gtf file versions. Perhaps one of those would suffice?

For example:

For human:

> query(ah, c("Homo sapiens", "ensembl", "GRCh37"))
AnnotationHub with 7 records
# snapshotDate(): 2019-04-29 
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH7558"]]' 

            title                     
  AH7558  | Homo_sapiens.GRCh37.70.gtf
  AH7619  | Homo_sapiens.GRCh37.69.gtf
  AH7666  | Homo_sapiens.GRCh37.71.gtf
  AH7726  | Homo_sapiens.GRCh37.72.gtf
  AH7790  | Homo_sapiens.GRCh37.73.gtf
  AH8753  | Homo_sapiens.GRCh37.74.gtf
  AH10684 | Homo_sapiens.GRCh37.75.gtf

For mouse:

> query(ah, c("ensembl", "GRCm38"))
AnnotationHub with 152 records
# snapshotDate(): 2019-04-29 
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: TwoBitFile, GRanges, EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH7567"]]' 

            title                                           
  AH7567  | Mus_musculus.GRCm38.70.gtf                      
  AH7628  | Mus_musculus.GRCm38.69.gtf                      
  AH7675  | Mus_musculus.GRCm38.71.gtf                      
  AH7736  | Mus_musculus.GRCm38.72.gtf                      
  AH7799  | Mus_musculus.GRCm38.73.gtf                      
  ...       ...                                             
  AH70174 | Mus_musculus.GRCm38.cdna.all.2bit               
  AH70175 | Mus_musculus.GRCm38.dna.primary_assembly.2bit   
  AH70176 | Mus_musculus.GRCm38.dna_rm.primary_assembly.2bit
  AH70177 | Mus_musculus.GRCm38.dna_sm.primary_assembly.2bit
  AH70178 | Mus_musculus.GRCm38.ncrna.2bit                  

For zebrafish

> query(ah, c("ensembl", "GRCz10"))
AnnotationHub with 80 records
# snapshotDate(): 2019-04-29 
# $dataprovider: Ensembl
# $species: Danio rerio
# $rdataclass: TwoBitFile, GRanges, EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH47053"]]' 

            title                                  
  AH47053 | Danio_rerio.GRCz10.80.gtf              
  AH47950 | Danio_rerio.GRCz10.81.gtf              
  AH49657 | Danio_rerio.GRCz10.cdna.all.2bit       
  AH49658 | Danio_rerio.GRCz10.dna_rm.toplevel.2bit
  AH49659 | Danio_rerio.GRCz10.dna_sm.toplevel.2bit
  ...       ...                                    
  AH60362 | Danio_rerio.GRCz10.cdna.all.2bit       
  AH60363 | Danio_rerio.GRCz10.dna_rm.toplevel.2bit
  AH60364 | Danio_rerio.GRCz10.dna_sm.toplevel.2bit
  AH60365 | Danio_rerio.GRCz10.ncrna.2bit          
  AH60762 | Ensembl 91 EnsDb for Danio Rerio

For fruit fly

> query(ah, c("ensembl", "BDGP5"))
AnnotationHub with 10 records
# snapshotDate(): 2019-04-29 
# $dataprovider: Ensembl
# $species: Drosophila melanogaster
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH7549"]]' 

            title                               
  AH7549  | Drosophila_melanogaster.BDGP5.70.gtf
  AH7610  | Drosophila_melanogaster.BDGP5.69.gtf
  AH7657  | Drosophila_melanogaster.BDGP5.71.gtf
  AH7717  | Drosophila_melanogaster.BDGP5.72.gtf
  AH7780  | Drosophila_melanogaster.BDGP5.73.gtf
  AH8743  | Drosophila_melanogaster.BDGP5.74.gtf
  AH10674 | Drosophila_melanogaster.BDGP5.75.gtf
  AH28596 | Drosophila_melanogaster.BDGP5.78.gtf
  AH28664 | Drosophila_melanogaster.BDGP5.76.gtf
  AH28802 | Drosophila_melanogaster.BDGP5.77.gtf
ADD COMMENT
0
Entering edit mode

Thank you for the reply. Would you mind explaining what changed in Rsamtools to prevent parsing of the FASTA files? I didn't see any changes that I would have expected to cause such a problem from the Rsamtools change log.

Thank you also for pointing me to the GTF and 2bit files. The GTF files correspond to genome annotations rather than primary genome sequence. I might be able to adapt code to use the 2bit files; however, is it correct that those aren't available for the hg19 / GRCh37 assembly?

ADD REPLY
0
Entering edit mode

Thank you for the reply. Would you mind explaining what changed in Rsamtools to prevent parsing of the FASTA files? I didn't see any changes that I would have expected to cause such a problem from the Rsamtools change log.

Thank you also for pointing me to the GTF and 2bit files. The GTF files correspond to genome annotations rather than primary genome sequence. I might be able to adapt code to use the 2bit files; however, is it correct that those aren't available for the hg19 / GRCh37 assembly?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you for the reference. Is the problem that AH134 and other AnnotationHub objects were compressed with razip instead of bgzip? If so, would it be possible to update those files to be stored with bgzip? Many people use those assemblies--some of which correspond to the latest version of the genome assembly available--and so I think that such an update would help many people.

ADD REPLY
0
Entering edit mode

Thank you for the reference. Is the problem that AH134 and other AnnotationHub objects were compressed with razip instead of bgzip? If so, would it be possible to update those files to be stored with bgzip? Many people use those assemblies--some of which correspond to the latest version of the genome assembly available--and so I think that such an update would help many people.

ADD REPLY
0
Entering edit mode

Yes the problem was that those objects were razip. We are discussing how to proceed further and will hopefully have a solution soon.

ADD REPLY
0
Entering edit mode

That's wonderful to hear. Thank you! I think that a solution will benefit many people (and my own group as well).

ADD REPLY
0
Entering edit mode

Would it be possible to create versions that are compressed with bgzip instead?

ADD REPLY
0
Entering edit mode

As mentioned above we started providing 2bit files instead of bgzip resources. It is recommended to update to using the 2bits as this is what Bioconductor will provide by default.

See the above post for the mouse and zebrafish options...

It seems like the following could be used for Homo sapien hg19 build

> query(hub, c("hg19", "2bit"))
AnnotationHub with 1 record
# snapshotDate(): 2019-05-15 
# names(): AH13964
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: TwoBitFile
# $rdatadateadded: 2014-12-15
# $title: hg19.2bit
# $description: UCSC 2 bit file for hg19 
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: TwoBit
# $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.2bit
# $sourcesize: NA
# $tags: c("2bit", "UCSC", "genome") 
# retrieve record with 'object[["AH13964"]]' 

and the following for the fruit fly

> query(hub, c("2bit", "dm3"))
AnnotationHub with 1 record
# snapshotDate(): 2019-05-15 
# names(): AH14079
# $dataprovider: UCSC
# $species: Drosophila melanogaster
# $rdataclass: TwoBitFile
# $rdatadateadded: 2014-12-15
# $title: dm3.2bit
# $description: UCSC 2 bit file for dm3 
# $taxonomyid: 7227
# $genome: dm3
# $sourcetype: TwoBit
# $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/dm3/bigZips/dm3.2bit
# $sourcesize: NA
# $tags: c("2bit", "UCSC", "genome") 
# retrieve record with 'object[["AH14079"]]' 

ADD REPLY
0
Entering edit mode

Thank you for the suggestion. I will give it a try.

ADD REPLY

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6