Question: Why are important organisms missing from AnnotationHub() $rdataclass == "OrgDb"? 0 3.8 years ago by efoss10 United States efoss10 wrote: In the course of following an online introduction to AnnotationHub, I came across the following code: library("AnnotationHub") ah <- AnnotationHub() orgs <- subset(ah, ah$rdataclass == "OrgDb")

This provides me with data from 1145 organisms, but missing are such important organisms as "Homo sapiens" and "Mus musculus". Why is this?

Thank you.

Eric

orgdb annotationhub • 750 views
modified 3.8 years ago by Valerie Obenchain6.7k • written 3.8 years ago by efoss10
Answer: Why are important organisms missing from AnnotationHub() $rdataclass == "OrgDb"? 0 3.8 years ago by United States Valerie Obenchain6.7k wrote: Hi Eric, I'm able to retrieve the Homo.sapiens OrgDb. > length(orgs) [1] 1019 > length(table(orgs$species))
[1] 1017
> query(orgs, "Homo.sapiens")
AnnotationHub with 1 record
# snapshotDate(): 2015-11-19
# names(): AH49582
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ #$species: Homo sapiens
# $rdataclass: OrgDb #$title: org.Hs.eg.db.sqlite
# $description: NCBI gene ID based annotations about Homo sapiens #$taxonomyid: 9606
# $genome: NCBI genomes #$sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p... #$sourcelastmodifieddate: NA
# $sourcesize: NA #$tags: NCBI, Gene, Annotation
# retrieve record with 'object[["AH49582"]]'

> snapshotDate(ah)
[1] "2015-11-19"

What version of AnnotationHub are you using? Can you show the output of sessionInfo() and the code that caused the error (ie, tried to extract Homo.sapiens but couldn't)?

Valerie

Hi Valerie,

Here is my sessionInfo():

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] AnnotationHub_2.0.4

loaded via a namespace (and not attached):
[1] Rcpp_0.12.2                  IRanges_2.2.9                digest_0.6.8
[4] mime_0.4                     GenomeInfoDb_1.4.3           R6_2.1.1
[7] xtable_1.8-0                 DBI_0.3.1                    stats4_3.2.2
[10] magrittr_1.5                 RSQLite_1.0.0                BiocInstaller_1.18.5
[13] httr_1.0.0                   stringi_1.0-1                curl_0.9.4
[16] S4Vectors_0.6.6              tools_3.2.2                  stringr_1.0.0
[19] Biobase_2.28.0               shiny_0.12.2                 httpuv_1.3.3
[22] parallel_3.2.2               BiocGenerics_0.14.0          AnnotationDbi_1.30.1
[25] htmltools_0.2.6              interactiveDisplayBase_1.6.1

I'm not sure which version of AnnotationHub I'm using. How do I determine that? Regardless, I think it must be up to date because I downloaded it just yesterday using biocLite.

Could my problem be that I specified rdataclass to be "OrgDb"? I did so because I was following a tutorial here:

https://www.bioconductor.org/help/workflows/annotation/Annotation_Resources/

They did so to obtain information about the European rabbit ("Oryctolagus"), but when I followed along with the same instructions but instead specified Mus musculus, it didn't have information for that. And when I look in the orgs variable I created as shown above, per the instructions in the tutorial, I have 1145 entries, none of which is human or mouse.

Thanks.

Eric

P.S. Here are the some of the length commands you entered using my orgs, which was created as described above:

> length(orgs)
[1] 1145
> length(orgs$species) [1] 1145 > length(table(orgs$species))
[1] 1145
> query(orgs, "Homo.sapiens")
AnnotationHub with 0 records
# snapshotDate(): 2015-08-26
> 

Thanks for showing the output. Your sessionInfo() output shows the version of AnnotationHub (code) you're using and the snapshotDate() shows the date when data in the db were last changed.

Several weeks ago we had a problem with the OrgDbs not being tagged with the correct biocVersion. The problem was fixed in October, probably after the snapshotDate of 2015-08-26 that you're using. The most recent snapshotDate is 2015-11-19 (shown in my output).

The data should automatically update when you load the package and create a new hub object in a fresh session:

library(AnnotationHub)

ah <- AnnotationHub()

You should see the 2015-11-19 date included in possibleDates(ah). If they data don't update and you are still having problems you can try removing the cache - see ?removeCache

Valerie

Hi Valerie,

That fixed it. Thanks so much for the help.

Eric