How to manage AnnotationHub snapshots / most OrgDB are missing?
2
1
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 5 months ago
United States

I have projects split between R 3.6.3 and 4.0.0 and I am having trouble properly managing cached versions / snapshotDates of AnnotationHub resources. I first noticed this because apparently the newest snapshotDate "2020-04-27" is missing most of the OrgDb from NCBI. The AnnotationHub How To vignette has:

library(AnnotationHub)
ah <- AnnotationHub()
## snapshotDate(): 2020-03-31
query(ah, "OrgDb")
## AnnotationHub with 1708 records
## # snapshotDate(): 2020-03-31

However, there is a new snapshotDate available but it is missing most of these OrgDb:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2020-04-27
> query(ah, "OrgDb")
AnnotationHub with 19 records
# snapshotDate(): 2020-04-27

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.20.0 BiocFileCache_1.12.0 dbplyr_1.4.3         BiocGenerics_0.34.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6                  pillar_1.4.4                  compiler_4.0.0                BiocManager_1.30.10          
 [5] later_1.0.0                   tools_4.0.0                   digest_0.6.25                 bit_1.1-15.2                 
 [9] RSQLite_2.2.0                 memoise_1.1.0                 lifecycle_0.2.0               tibble_3.0.1                 
[13] pkgconfig_2.0.3               rlang_0.4.6                   shiny_1.4.0.2                 DBI_1.1.0                    
[17] rstudioapi_0.11               curl_4.3                      yaml_2.2.1                    fastmap_1.0.1                
[21] dplyr_0.8.5                   httr_1.4.1                    IRanges_2.22.1                vctrs_0.3.0                  
[25] S4Vectors_0.26.1              rappdirs_0.3.1                stats4_4.0.0                  bit64_0.9-7                  
[29] tidyselect_1.1.0              Biobase_2.48.0                glue_1.4.1                    R6_2.4.1                     
[33] AnnotationDbi_1.50.0          purrr_0.3.4                   blob_1.2.1                    magrittr_1.5                 
[37] promises_1.1.0                ellipsis_0.3.1                htmltools_0.4.0               assertthat_0.2.1             
[41] xtable_1.8-4                  mime_0.9                      interactiveDisplayBase_1.26.0 httpuv_1.5.2                 
[45] crayon_1.3.4                  BiocVersion_3.11.1 

The main AnnotationHub vignette seems to say that I could switch to a different snapshotDate by simply doing the below, but it doesn't work, I still get the same truncated list of OrgDbs:

> possibleDates(ah)
  [1] "2013-03-19" "2013-03-21" "2013-03-26" "2013-04-04" "2013-04-29" "2013-06-24" "2013-06-25" "2013-06-26" "2013-06-27"
# ...
[127] "2019-10-29" "2020-01-28" "2020-02-28" "2020-03-31" "2020-04-27"
> snapshotDate(ah) <- "2020-03-31"
> query(ah, "OrgDb")
AnnotationHub with 19 records
# snapshotDate(): 2020-03-31

I am also having weird issues when running two instances of RStudio at the same time, one with R 4.0.0 and one with R 3.6.3. I am having trouble replicating everything weird that I saw but this seems to be replicable:

# 1. Open RStudio running R 4.0.0. Force refresh of of cache with:
> library(AnnotationHub)
> ah <- refreshHub(hubClass="AnnotationHub")
  |================================================================| 100%

snapshotDate(): 2020-04-27
> query(ah, "OrgDb")
AnnotationHub with 19 records
# snapshotDate(): 2020-04-27

# 2. Open another RStudio running 3.6.3. I had previously downloaded a snapshotDate of "2019-10-29" and it seems to find this one at first:
> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2019-10-29
> query(ah, "OrgDb")
AnnotationHub with 1708 records
# snapshotDate(): 2019-10-29 
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.18.0 BiocFileCache_1.10.2 dbplyr_1.4.2         BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6                  later_1.0.0                   pillar_1.4.3                 
 [4] compiler_3.6.3                BiocManager_1.30.10           tools_3.6.3                  
 [7] digest_0.6.25                 bit_1.1-15.2                  RSQLite_2.2.0                
[10] memoise_1.1.0                 lifecycle_0.2.0               tibble_3.0.0                 
[13] pkgconfig_2.0.3               rlang_0.4.5                   shiny_1.4.0.2                
[16] DBI_1.1.0                     cli_2.0.2                     rstudioapi_0.11              
[19] curl_4.3                      yaml_2.2.1                    fastmap_1.0.1                
[22] dplyr_0.8.5                   httr_1.4.1                    IRanges_2.20.2               
[25] vctrs_0.2.4                   S4Vectors_0.24.3              rappdirs_0.3.1               
[28] stats4_3.6.3                  bit64_0.9-7                   tidyselect_1.0.0             
[31] Biobase_2.46.0                glue_1.4.0                    R6_2.4.1                     
[34] AnnotationDbi_1.48.0          fansi_0.4.1                   purrr_0.3.3                  
[37] blob_1.2.1                    magrittr_1.5                  promises_1.1.0               
[40] ellipsis_0.3.0                htmltools_0.4.0               assertthat_0.2.1             
[43] xtable_1.8-4                  mime_0.9                      interactiveDisplayBase_1.24.0
[46] httpuv_1.5.2                  crayon_1.3.4                  BiocVersion_3.10.1 

# 3. Switch back to 4.0.0 and then refresh hub again

> ah <- refreshHub(hubClass="AnnotationHub")
>   |================================================================| 100%

snapshotDate(): 2020-04-27
> query(ah, "OrgDb")
AnnotationHub with 19 records
# snapshotDate(): 2020-04-27


# 4. Switch back to 3.6.3; query OrgDb and see that is it wrong. Refresh hub and query again

> query(ah, "OrgDb")
AnnotationHub with 0 records
# snapshotDate(): 2019-10-29
> ah <- refreshHub(hubClass="AnnotationHub")
  |=======================================================================================| 100%

snapshotDate(): 2019-10-29
> query(ah, "OrgDb")
AnnotationHub with 1708 records
# snapshotDate(): 2019-10-29

# 5. Switch back to 4.0.0 and query for OrgDb again and see that is wrong:
> query(ah, "OrgDb")
AnnotationHub with 0 records
# snapshotDate(): 2020-04-27

I've been wading through the help page for ?AnnotationHub and I probably have to do some combination of the cache and localHub options but I cannot find any good examples of how to do this. So my specific questions are:

  1. How do I set and switch between local caches of specific snapshotDates for different versions of R?
  2. How do I switch to the 2020-03-31 snapshotDate in 4.0.0?
  3. What happened to all the OrgDB packages in snapshotDate 2020-04-27?!?

Thanks

AnnotationHub OrgDb snapshotDate • 2.7k views
ADD COMMENT
1
Entering edit mode
shepherl 4.1k
@lshep
Last seen 2 hours ago
United States

OrgDbs are a special case because they are only valid for a particular Bioconductor version. So you would have to Switch the Bioconductor version in order to use a older snapshot date for the OrgDbs (otherwise what you are doing is correct). I am in the process of adding the OrgDbs from NCBI. There is an error in the code to generate them that I am currently in the process of debugging but hope to have them uploaded in the next few days to be available for R 4.0.0 Bioc 3.11/Bioc3.12

ADD COMMENT
0
Entering edit mode

I apologize for the inconvenience this has causes and will let you know when they are available.

ADD REPLY
1
Entering edit mode

No worries! Thank you for all your work on AnnotationHub. We work with many non-model organisms and AnnotationHub is extremely useful.

ADD REPLY
0
Entering edit mode

Thanks for clarifying this, but it would have been so nice if this special case were documented as such, somewhere, I just spent an hour or so trying to figure this out ... IOW Can this be documented ? Or did I not look in the right place?

Philip

ADD REPLY
0
Entering edit mode

I'll look through the package documentation and see if there is an appropriate place to mention this if it not already is.

ADD REPLY
0
Entering edit mode
shepherl 4.1k
@lshep
Last seen 2 hours ago
United States

The orgDbs have been added. Please let me know if you have any further issues.

> BiocManager::version()
[1] '3.11'
> ah = AnnotationHub()
snapshotDate(): 2020-04-27
> query(ah, "OrgDb")
AnnotationHub with 1480 records
# snapshotDate(): 2020-04-27
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Escherichia coli, [Nectria] haematococca_mpVI_77-13-4, Zymosepto...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH79568"]]' 

            title                                        
  AH79568 | org.Ag.eg.db.sqlite                          
  AH79569 | org.At.tair.db.sqlite                        
  AH79570 | org.Bt.eg.db.sqlite                          
  AH79571 | org.Cf.eg.db.sqlite                          
  AH79572 | org.Gg.eg.db.sqlite                          
  ...       ...                                          
  AH81959 | org.Bathycoccus_prasinos.eg.sqlite           
  AH81960 | org.Kwoniella_pini_CBS_10737.eg.sqlite       
  AH81961 | org.Burkholderia_cepacia_ATCC_25416.eg.sqlite
  AH81962 | org.Burkholderia_cepacia_DSM_7288.eg.sqlite  
  AH81963 | org.Burkholderia_cepacia_LMG_1222.eg.sqlite  
ADD COMMENT

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6