Question: Why are important organisms missing from AnnotationHub() $rdataclass == "OrgDb"?
gravatar for efoss
3.8 years ago by
United States
efoss10 wrote:

In the course of following an online introduction to AnnotationHub, I came across the following code: 


ah <- AnnotationHub()

orgs <- subset(ah, ah$rdataclass == "OrgDb")

This provides me with data from 1145 organisms, but missing are such important organisms as "Homo sapiens" and "Mus musculus". Why is this? 

Thank you. 



orgdb annotationhub • 750 views
ADD COMMENTlink modified 3.8 years ago by Valerie Obenchain6.7k • written 3.8 years ago by efoss10
Answer: Why are important organisms missing from AnnotationHub() $rdataclass == "OrgDb"?
gravatar for Valerie Obenchain
3.8 years ago by
United States
Valerie Obenchain6.7k wrote:

Hi Eric,

I'm able to retrieve the Homo.sapiens OrgDb. 

> length(orgs)

[1] 1019
> length(table(orgs$species))
[1] 1017
> query(orgs, "Homo.sapiens")
AnnotationHub with 1 record
# snapshotDate(): 2015-11-19 
# names(): AH49582
# $dataprovider:
# $species: Homo sapiens
# $rdataclass: OrgDb
# $title:
# $description: NCBI gene ID based annotations about Homo sapiens
# $taxonomyid: 9606
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl:,
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation 
# retrieve record with 'object[["AH49582"]]' 

> snapshotDate(ah)
[1] "2015-11-19"

What version of AnnotationHub are you using? Can you show the output of sessionInfo() and the code that caused the error (ie, tried to extract Homo.sapiens but couldn't)?



ADD COMMENTlink written 3.8 years ago by Valerie Obenchain6.7k

Hi Valerie, 

Here is my sessionInfo():


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.0.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2                  IRanges_2.2.9                digest_0.6.8                
 [4] mime_0.4                     GenomeInfoDb_1.4.3           R6_2.1.1                    
 [7] xtable_1.8-0                 DBI_0.3.1                    stats4_3.2.2                
[10] magrittr_1.5                 RSQLite_1.0.0                BiocInstaller_1.18.5        
[13] httr_1.0.0                   stringi_1.0-1                curl_0.9.4                  
[16] S4Vectors_0.6.6              tools_3.2.2                  stringr_1.0.0               
[19] Biobase_2.28.0               shiny_0.12.2                 httpuv_1.3.3                
[22] parallel_3.2.2               BiocGenerics_0.14.0          AnnotationDbi_1.30.1        
[25] htmltools_0.2.6              interactiveDisplayBase_1.6.1

I'm not sure which version of AnnotationHub I'm using. How do I determine that? Regardless, I think it must be up to date because I downloaded it just yesterday using biocLite. 

Could my problem be that I specified rdataclass to be "OrgDb"? I did so because I was following a tutorial here:

They did so to obtain information about the European rabbit ("Oryctolagus"), but when I followed along with the same instructions but instead specified Mus musculus, it didn't have information for that. And when I look in the orgs variable I created as shown above, per the instructions in the tutorial, I have 1145 entries, none of which is human or mouse. 



ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by efoss10

P.S. Here are the some of the length commands you entered using my orgs, which was created as described above: 

> length(orgs)
[1] 1145
> length(orgs$species)
[1] 1145
> length(table(orgs$species))
[1] 1145
> query(orgs, "Homo.sapiens")
AnnotationHub with 0 records
# snapshotDate(): 2015-08-26 
ADD REPLYlink written 3.8 years ago by efoss10

Thanks for showing the output. Your sessionInfo() output shows the version of AnnotationHub (code) you're using and the snapshotDate() shows the date when data in the db were last changed.

Several weeks ago we had a problem with the OrgDbs not being tagged with the correct biocVersion. The problem was fixed in October, probably after the snapshotDate of 2015-08-26 that you're using. The most recent snapshotDate is 2015-11-19 (shown in my output).

The data should automatically update when you load the package and create a new hub object in a fresh session:


ah <- AnnotationHub()

You should see the 2015-11-19 date included in possibleDates(ah). If they data don't update and you are still having problems you can try removing the cache - see ?removeCache


ADD REPLYlink written 3.8 years ago by Valerie Obenchain6.7k

Hi Valerie, 

That fixed it. Thanks so much for the help. 


ADD REPLYlink written 3.8 years ago by efoss10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 91 users visited in the last hour