Search
Question: Why are important organisms missing from AnnotationHub() $rdataclass == "OrgDb"?
0
gravatar for efoss
24 months ago by
efoss0
United States
efoss0 wrote:

In the course of following an online introduction to AnnotationHub, I came across the following code: 

 

library("AnnotationHub")
ah <- AnnotationHub()

orgs <- subset(ah, ah$rdataclass == "OrgDb")

This provides me with data from 1145 organisms, but missing are such important organisms as "Homo sapiens" and "Mus musculus". Why is this? 

Thank you. 

Eric

 

ADD COMMENTlink modified 24 months ago by Valerie Obenchain ♦♦ 6.4k • written 24 months ago by efoss0
0
gravatar for Valerie Obenchain
24 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi Eric,

I'm able to retrieve the Homo.sapiens OrgDb. 

> length(orgs)

[1] 1019
> length(table(orgs$species))
[1] 1017
> query(orgs, "Homo.sapiens")
AnnotationHub with 1 record
# snapshotDate(): 2015-11-19 
# names(): AH49582
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Homo sapiens
# $rdataclass: OrgDb
# $title: org.Hs.eg.db.sqlite
# $description: NCBI gene ID based annotations about Homo sapiens
# $taxonomyid: 9606
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation 
# retrieve record with 'object[["AH49582"]]' 

> snapshotDate(ah)
[1] "2015-11-19"

What version of AnnotationHub are you using? Can you show the output of sessionInfo() and the code that caused the error (ie, tried to extract Homo.sapiens but couldn't)?

Valerie

 

ADD COMMENTlink written 24 months ago by Valerie Obenchain ♦♦ 6.4k

Hi Valerie, 

Here is my sessionInfo():

 

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.0.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2                  IRanges_2.2.9                digest_0.6.8                
 [4] mime_0.4                     GenomeInfoDb_1.4.3           R6_2.1.1                    
 [7] xtable_1.8-0                 DBI_0.3.1                    stats4_3.2.2                
[10] magrittr_1.5                 RSQLite_1.0.0                BiocInstaller_1.18.5        
[13] httr_1.0.0                   stringi_1.0-1                curl_0.9.4                  
[16] S4Vectors_0.6.6              tools_3.2.2                  stringr_1.0.0               
[19] Biobase_2.28.0               shiny_0.12.2                 httpuv_1.3.3                
[22] parallel_3.2.2               BiocGenerics_0.14.0          AnnotationDbi_1.30.1        
[25] htmltools_0.2.6              interactiveDisplayBase_1.6.1

I'm not sure which version of AnnotationHub I'm using. How do I determine that? Regardless, I think it must be up to date because I downloaded it just yesterday using biocLite. 

Could my problem be that I specified rdataclass to be "OrgDb"? I did so because I was following a tutorial here: 

https://www.bioconductor.org/help/workflows/annotation/Annotation_Resources/

They did so to obtain information about the European rabbit ("Oryctolagus"), but when I followed along with the same instructions but instead specified Mus musculus, it didn't have information for that. And when I look in the orgs variable I created as shown above, per the instructions in the tutorial, I have 1145 entries, none of which is human or mouse. 

Thanks. 

Eric

ADD REPLYlink modified 24 months ago • written 24 months ago by efoss0

P.S. Here are the some of the length commands you entered using my orgs, which was created as described above: 

> length(orgs)
[1] 1145
> length(orgs$species)
[1] 1145
> length(table(orgs$species))
[1] 1145
> query(orgs, "Homo.sapiens")
AnnotationHub with 0 records
# snapshotDate(): 2015-08-26 
> 
ADD REPLYlink written 24 months ago by efoss0

Thanks for showing the output. Your sessionInfo() output shows the version of AnnotationHub (code) you're using and the snapshotDate() shows the date when data in the db were last changed.

Several weeks ago we had a problem with the OrgDbs not being tagged with the correct biocVersion. The problem was fixed in October, probably after the snapshotDate of 2015-08-26 that you're using. The most recent snapshotDate is 2015-11-19 (shown in my output).

The data should automatically update when you load the package and create a new hub object in a fresh session:

library(AnnotationHub)

ah <- AnnotationHub()

You should see the 2015-11-19 date included in possibleDates(ah). If they data don't update and you are still having problems you can try removing the cache - see ?removeCache

Valerie

ADD REPLYlink written 24 months ago by Valerie Obenchain ♦♦ 6.4k

Hi Valerie, 

That fixed it. Thanks so much for the help. 

Eric

ADD REPLYlink written 24 months ago by efoss0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 370 users visited in the last hour