AnnotationHub-cant extract database
2
0
Entering edit mode
ecpierce • 0
@ecpierce-12798
Last seen 6.9 years ago

I'm trying to extract a certain entry from annotationhub, but I keep getting a "Public" error.  Any idea how to fix this?

library(AnnotationHub)
hub <- AnnotationHub()

org.PA01.eg.db <- hub[["AH48516"]]

Error in hub[["AH48516"]] : Public

I am able to extract other entries e.g. "AH4"

annotationhub • 2.3k views
ADD COMMENT
1
Entering edit mode
@valerie-obenchain-4275
Last seen 2.2 years ago
United States

Hi,

Your sessionInfo() looks good. I see you're using the current release (Bioconductor 3.4) and have the current version of AnnotationHub loaded. Bioconductor 3.4 was released in October 2016 that is why when you load AnnotationHub the default snapshot date is 2016-10-11:

> hub = AnnotationHub()
snapshotDate(): 2016-10-11

All snapshot dates for a particular release can be seen by calling possibleDates() on the AnnotationHub object. Note that the 2016-10-11 date is the max:

> possibleDates(hub)
[1] "2013-03-19" "2013-03-21" "2013-03-26" "2013-04-04" "2013-04-29"
[6] "2013-06-24" "2013-06-25" "2013-06-26" "2013-06-27" "2013-10-29"
...
[46] "2015-08-17" "2015-08-26" "2015-12-28" "2015-12-29" "2016-01-25"
[51] "2016-03-07" "2016-05-03" "2016-05-25" "2016-06-06" "2016-07-20"
[56] "2016-08-15" "2016-10-11"

I can also reproduce your results:

> length(query(hub,"Pseudomonas"))
[1] 1

So all of that looks good. As for your coworker on windows, I still can't see the output of sessionInfo(). The commands you show above are using a snapshot date of 2017-04-10 which would indicate they are using R/Bioconductor devel. If I load AnnotationHub in devel I see

> hub = AnnotationHub()
snapshotDate(): 2017-04-10

As a side note, you'll see the number of possibleDates() for devel are different:

> possibleDates(hub)
 [1] "2013-03-19" "2013-03-21" "2013-03-26" "2013-04-04" "2013-04-29"
 [6] "2013-06-24" "2013-06-25" "2013-06-26" "2013-06-27" "2013-10-29"
...
[56] "2016-08-15" "2016-10-11" "2016-11-03" "2016-11-08" "2016-11-09"
[61] "2016-11-13" "2016-11-14" "2016-12-22" "2016-12-28" "2017-01-05"
[66] "2017-02-07" "2017-04-03" "2017-04-04" "2017-04-05" "2017-04-10"
[71] "2017-04-10"

When I query the devel hub for 'Pseudomonas' I still get the one record:

> query(hub,"Pseudomonas")
AnnotationHub with 1 record
# snapshotDate(): 2017-04-10
# names(): AH10565
# $dataprovider: Inparanoid8
# $species: Pseudomonas aeruginosa
# $rdataclass: Inparanoid8Db
# $title: hom.Pseudomonas_aeruginosa.inp8.sqlite
# $description: Inparanoid 8 annotations about Pseudomonas aeruginosa
# $taxonomyid: 208964
# $genome: inparanoid8 genomes
# $sourcetype: Inparanoid
# $sourceurl: http://inparanoid.sbc.su.se/download/current/Orthologs/P.aerug...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("Inparanoid", "Gene", "Homology", "Annotation")
# retrieve record with 'object[["AH10565"]]'

Unless we see the sessionInfo() from your coworker and get a little more information we can't help. We need to be able to reproduce the error, or unexpected results, to get to the bottom of this.

Valerie

ADD COMMENT
0
Entering edit mode

Hi, 

Sorry for the delay.  There is a limit on how frequently I can post here.  Here is her sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] AnnotationHub_2.4.2 BiocGenerics_0.18.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  IRanges_2.6.1                 digest_0.6.12                
[4] mime_0.5                      R6_2.2.0                      xtable_1.8-2                
 [7] DBI_0.6-1                     stats4_3.3.1                  RSQLite_1.1-2                
[10] BiocInstaller_1.22.3          httr_1.2.1                    curl_2.4                     
[13] S4Vectors_0.10.3              tools_3.3.1                   Biobase_2.32.0               
[16] shiny_1.0.1                   httpuv_1.3.3                  AnnotationDbi_1.34.4         
[19] memoise_1.0.0                 htmltools_0.3.5               interactiveDisplayBase_1.10.3

ADD REPLY
1
Entering edit mode

The problem is with the version of AnnotationHub - it's 2 (soon to be 3 ) releases old. The current release is version 2.6.5 seen here:

http://www.bioconductor.org/checkResults/release/bioc-LATEST/

The old 2.4.2 version does not have the correct logic implemented for the OrgDb objects. The data in an OrgDb is a snapshot of the most current information known at the time (vs tied to a genome build such as the TxDb packages). It quickly becomes out of date as updates are submitted to the source - be it UCSC, NCBI etc. Because of this we want all OrgDbs to be refreshed every 6 months before a release. The OrgDbs your coworker has listed are from 2014 and 2015:

> library(DBI)
> con = dbconn(hub)
> table(dbGetQuery(con, "select rdatadateadded from resources where title like '%Pseudomonas%'"))

2014-03-31 2014-07-02 2014-07-09 2015-07-27 
         1          1         25          9 

It would make sense to use these annotations if you were trying to reproduce an analysis from 2014 but not if you are doing exploratory work / analysis today. I would reccomend the coworker update R/Bioconductor and use the most current packages.

As for the Pseudomonas record in the current AnnotationHub, we can see that's not an OrgDb but the Inparanoid8 database package:

> mcols(query(hub,"Pseudomonas"))[c("title", "rdatadateadded", "rdataclass")]
DataFrame with 1 row and 3 columns
                                         title rdatadateadded    rdataclass
                                   <character>    <character>   <character>
AH10565 hom.Pseudomonas_aeruginosa.inp8.sqlite     2014-03-31 Inparanoid8Db

 

The Inparanoid db has been static for some time now with the last update in 2013 (http://inparanoid.sbc.su.se/cgi-bin/index.cgi). We propagate but do not rebuild the package with each release. 

Valerie

PS: FYI there are a few more Inparanoid resources in the main Bioconductor repository if you're interested but these are also from 2014:

http://www.bioconductor.org/packages/release/BiocViews.html#___InparanoidDb

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

Where did you get that hub name? I don't see anything in that range:

> hub <- AnnotationHub()

> z <- mcols(hub)

> grep("AH4851[0-9]", row.names(z), value = TRUE)
character(0)
> grep("AH485[0-9][0-9]", row.names(z), value = TRUE)
character(0)

> grep("AH48[0-9][0-9][0-9]", row.names(z), value = TRUE)
[1] "AH48000" "AH48001" "AH48002" "AH48003" "AH48004" "AH48005"

So the only hub values between AH48000 and AH48999 are these 6, which don't include the one you are looking for.

The only thing I find for Pseudomonas aeruginosa is this:

> query(hub, c("pseudomonas"))
AnnotationHub with 1 record
# snapshotDate(): 2016-10-11
# names(): AH10565
# $dataprovider: Inparanoid8
# $species: Pseudomonas aeruginosa
# $rdataclass: Inparanoid8Db
# $title: hom.Pseudomonas_aeruginosa.inp8.sqlite
# $description: Inparanoid 8 annotations about Pseudomonas aeruginosa
# $taxonomyid: 208964
# $genome: inparanoid8 genomes
# $sourcetype: Inparanoid
# $sourceurl: http://inparanoid.sbc.su.se/download/current/Orthologs/P.aerug...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("Inparanoid", "Gene", "Homology", "Annotation")
# retrieve record with 'object[["AH10565"]]'
> z["AH10565",]
DataFrame with 1 row and 14 columns
                                         title dataprovider
                                   <character>  <character>
AH10565 hom.Pseudomonas_aeruginosa.inp8.sqlite  Inparanoid8
                       species taxonomyid              genome
                   <character>  <integer>         <character>
AH10565 Pseudomonas aeruginosa     208964 inparanoid8 genomes
                                                  description
                                                  <character>
AH10565 Inparanoid 8 annotations about Pseudomonas aeruginosa
        coordinate_1_based                        maintainer rdatadateadded
                 <integer>                       <character>    <character>
AH10565                  1 Marc Carlson <mcarlson@fhcrc.org>     2014-03-31
                    preparerclass                         tags    rdataclass
                      <character>                       <list>   <character>
AH10565 Inparanoid8ImportPreparer Inparanoid,Gene,Homology,... Inparanoid8Db
                                                                  sourceurl
                                                                <character>
AH10565 http://inparanoid.sbc.su.se/download/current/Orthologs/P.aeruginosa
         sourcetype
        <character>
AH10565  Inparanoid

Which is an Inparanoid homology mapping database.

ADD COMMENT
0
Entering edit mode

I got the number from here because my query didnt seem to be working:  error while making database using Annotationforge package

I get the exact same output as you do when I submit this Pseudomonas query.  However when my coworker (on Windows machine) does this query, she also gets an entire list of Pseudomonas OrgDb results.  So it seems like they should be there...She is using a slightly newer snapshot of AnnotationHub. Tried uninstalling and installing AnnotationHub but the Snapshot date stays the sam

ADD REPLY
0
Entering edit mode

Please show the exact commands you and your coworker are running and include the output from sessionInfo(). It does not matter if you are on windows, linux or mac; what will matter is the version of R/Bioconductor you are using.

Valerie

ADD REPLY
0
Entering edit mode

These are my coworkers commands:

> ah=AnnotationHub()
snapshotDate(): 2017-04-10
> query(ah,"Pseudomonas")
AnnotationHub with 35 records
# snapshotDate(): 2017-04-10 
# $dataprovider: NCBI, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, Inparanoid8
# $species: Pseudomonas aeruginosa, Pseudomonas aeruginosa_PAO1, Pseudomonas fluorescens_SBW25, Pseudomonas protegens_Pf-5, Pseudomonas putida...
# $rdataclass: OrgDb, Inparanoid8Db
# additional mcols(): taxonomyid, genome, description, tags, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH10565"]]' 

            title                                                
  AH10565 | hom.Pseudomonas_aeruginosa.inp8.sqlite               
  AH12818 | org.Pseudomonas_mendocina_NK-01.eg.sqlite            
  AH12869 | org.Pseudomonas_putida_KT2440.eg.sqlite              
  AH12938 | org.Pseudomonas_syringae_pv._syringae_B728a.eg.sqlite
  AH12940 | org.Pseudomonas_fluorescens_Pf0-1.eg.sqlite          
  ...       ...                                                  
  AH48510 | org.Pseudomonas_aeruginosa_PAO1.eg.sqlite            
  AH48516 | org.Pseudomonas_putida_KT2440.eg.sqlite              
  AH48538 | org.Pseudomonas_syringae_pv._syringae_B728a.eg.sqlite
  AH48582 | org.Pseudomonas_mendocina_ymp.eg.sqlite              
  AH48621 | org.Pseudomonas_stutzeri_A1501.eg.sqlite 

 

And here are mine:

> library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2016-10-11

> query(hub,"Pseudomonas")

AnnotationHub with 1 record
# snapshotDate(): 2016-10-11 
# names(): AH10565
# $dataprovider: Inparanoid8
# $species: Pseudomonas aeruginosa
# $rdataclass: Inparanoid8Db
# $title: hom.Pseudomonas_aeruginosa.inp8.sqlite
# $description: Inparanoid 8 annotations about Pseudomonas aeruginosa
# $taxonomyid: 208964
# $genome: inparanoid8 genomes
# $sourcetype: Inparanoid
# $sourceurl: http://inparanoid.sbc.su.se/download/current/Orthologs/P.aerugi...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("Inparanoid", "Gene", "Homology", "Annotation") 
# retrieve record with 'object[["AH10565"]]' 

ADD REPLY
0
Entering edit mode

We need to see the output of sessionInfo() for both you and your coworker. This will show the R/Bioconductor package versions you're using. For example, here is mine -

> sessionInfo()
R Under development (unstable) (2017-03-15 r72352)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 24 (Workstation Edition)

Matrix products: default
BLAS: /home/vobencha/R/R-dev/trunk/lib/libRblas.so
LAPACK: /home/vobencha/R/R-dev/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] AnnotationHub_2.7.14 BiocGenerics_0.21.3 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  IRanges_2.9.19               
 [3] digest_0.6.12                 mime_0.5                     
 [5] R6_2.2.0                      xtable_1.8-2                 
 [7] DBI_0.6-1                     stats4_3.4.0                 
 [9] RSQLite_1.1-2                 BiocInstaller_1.25.3         
[11] httr_1.2.1                    S4Vectors_0.13.15            
[13] Biobase_2.35.1                shiny_1.0.1                  
[15] httpuv_1.3.3                  yaml_2.1.14                  
[17] compiler_3.4.0                AnnotationDbi_1.37.4         
[19] memoise_1.0.0                 htmltools_0.3.5              
[21] interactiveDisplayBase_1.13.0

ADD REPLY
0
Entering edit mode

This is my sessionInfo

And this is my sessioninfo:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.2

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] AnnotationHub_2.6.5 BiocGenerics_0.20.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  BiocInstaller_1.24.0         
 [3] plyr_1.8.4                    tools_3.3.3                  
 [5] digest_0.6.12                 RSQLite_1.1-2                
 [7] DOSE_3.0.10                   memoise_1.0.0                
 [9] tibble_1.3.0                  gtable_0.2.0                 
[11] fastmatch_1.1-0               igraph_1.0.1                 
[13] shiny_1.0.1                   DBI_0.6-1                    
[15] curl_2.4                      yaml_2.1.14                  
[17] fgsea_1.0.2                   gridExtra_2.2.1              
[19] stringr_1.2.0                 httr_1.2.1.9000              
[21] clusterProfiler_3.2.14        S4Vectors_0.12.2             
[23] IRanges_2.8.2                 stats4_3.3.3                 
[25] grid_3.3.3                    qvalue_2.6.0                 
[27] data.table_1.10.4             Biobase_2.34.0               
[29] R6_2.2.0                      AnnotationDbi_1.36.2         
[31] BiocParallel_1.8.2            GOSemSim_2.0.4               
[33] GO.db_3.4.0                   ggplot2_2.2.1                
[35] DO.db_2.9                     reshape2_1.4.2               
[37] tidyr_0.6.1                   magrittr_1.5                 
[39] htmltools_0.3.5               scales_0.4.1                 
[41] splines_3.3.3                 xtable_1.8-2                 
[43] mime_0.5                      interactiveDisplayBase_1.12.0
[45] colorspace_1.3-2              httpuv_1.3.3                 
[47] stringi_1.1.5                 lazyeval_0.2.0               
[49] munsell_0.4.3                

ADD REPLY

Login before adding your answer.

Traffic: 673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6