Why are important organisms missing from AnnotationHub() $rdataclass == "OrgDb"?
1
0
Entering edit mode
efoss ▴ 10
@efoss-8908
Last seen 3.4 years ago
United States

In the course of following an online introduction to AnnotationHub, I came across the following code: 

 

library("AnnotationHub")
ah <- AnnotationHub()

orgs <- subset(ah, ah$rdataclass == "OrgDb")

This provides me with data from 1145 organisms, but missing are such important organisms as "Homo sapiens" and "Mus musculus". Why is this? 

Thank you. 

Eric

 

annotationhub orgdb • 2.1k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

Hi Eric,

I'm able to retrieve the Homo.sapiens OrgDb. 

> length(orgs)

[1] 1019
> length(table(orgs$species))
[1] 1017
> query(orgs, "Homo.sapiens")
AnnotationHub with 1 record
# snapshotDate(): 2015-11-19 
# names(): AH49582
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Homo sapiens
# $rdataclass: OrgDb
# $title: org.Hs.eg.db.sqlite
# $description: NCBI gene ID based annotations about Homo sapiens
# $taxonomyid: 9606
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation 
# retrieve record with 'object[["AH49582"]]' 

> snapshotDate(ah)
[1] "2015-11-19"

What version of AnnotationHub are you using? Can you show the output of sessionInfo() and the code that caused the error (ie, tried to extract Homo.sapiens but couldn't)?

Valerie

 

ADD COMMENT
0
Entering edit mode

Hi Valerie, 

Here is my sessionInfo():

 

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.0.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2                  IRanges_2.2.9                digest_0.6.8                
 [4] mime_0.4                     GenomeInfoDb_1.4.3           R6_2.1.1                    
 [7] xtable_1.8-0                 DBI_0.3.1                    stats4_3.2.2                
[10] magrittr_1.5                 RSQLite_1.0.0                BiocInstaller_1.18.5        
[13] httr_1.0.0                   stringi_1.0-1                curl_0.9.4                  
[16] S4Vectors_0.6.6              tools_3.2.2                  stringr_1.0.0               
[19] Biobase_2.28.0               shiny_0.12.2                 httpuv_1.3.3                
[22] parallel_3.2.2               BiocGenerics_0.14.0          AnnotationDbi_1.30.1        
[25] htmltools_0.2.6              interactiveDisplayBase_1.6.1

I'm not sure which version of AnnotationHub I'm using. How do I determine that? Regardless, I think it must be up to date because I downloaded it just yesterday using biocLite. 

Could my problem be that I specified rdataclass to be "OrgDb"? I did so because I was following a tutorial here: 

https://www.bioconductor.org/help/workflows/annotation/Annotation_Resources/

They did so to obtain information about the European rabbit ("Oryctolagus"), but when I followed along with the same instructions but instead specified Mus musculus, it didn't have information for that. And when I look in the orgs variable I created as shown above, per the instructions in the tutorial, I have 1145 entries, none of which is human or mouse. 

Thanks. 

Eric

ADD REPLY
0
Entering edit mode

P.S. Here are the some of the length commands you entered using my orgs, which was created as described above: 

> length(orgs)
[1] 1145
> length(orgs$species)
[1] 1145
> length(table(orgs$species))
[1] 1145
> query(orgs, "Homo.sapiens")
AnnotationHub with 0 records
# snapshotDate(): 2015-08-26 
> 
ADD REPLY
0
Entering edit mode

Thanks for showing the output. Your sessionInfo() output shows the version of AnnotationHub (code) you're using and the snapshotDate() shows the date when data in the db were last changed.

Several weeks ago we had a problem with the OrgDbs not being tagged with the correct biocVersion. The problem was fixed in October, probably after the snapshotDate of 2015-08-26 that you're using. The most recent snapshotDate is 2015-11-19 (shown in my output).

The data should automatically update when you load the package and create a new hub object in a fresh session:

library(AnnotationHub)

ah <- AnnotationHub()

You should see the 2015-11-19 date included in possibleDates(ah). If they data don't update and you are still having problems you can try removing the cache - see ?removeCache

Valerie

ADD REPLY
0
Entering edit mode

Hi Valerie, 

That fixed it. Thanks so much for the help. 

Eric

ADD REPLY

Login before adding your answer.

Traffic: 468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6