Hello,
I ran into a puzzling situation with AnnotationHub when trying to retrieve updated annotations for rat. NCBI released a new gene model set for rat at the end of July (beginning of Aug by the time it propagated through their ftp server) that we used for a recent RNA-Seq experiment. BioC's org.Rn.eg.db package was created back in Mar/April 2016, and so is missing ~700 new genes. I tried using AnnotationHub to get updated annotations, but despite the fact the snapshotDate() is 2016-08-15, which should have been just after the updated annotations, the OrgDB retrieved for rat has an older EGSOURCEDATE: 2015-Aug11 than does org.Rn.eg.db EGSOURCEDATE: 2016-Mar14. I checked mouse and it has the same problem. Why are the OrgDB in AnnotationHub not current?
Thanks,
Jenny
> library(AnnotationHub)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply,
parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated,
eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit
> library(org.Rn.eg.db)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To
cite Bioconductor, see 'citation("Biobase")', and for packages
'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:AnnotationHub’:
cache
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
colMeans, colSums, expand.grid, rowMeans, rowSums
> library(org.Mm.eg.db)
>
>
> ah = AnnotationHub()
snapshotDate(): 2016-08-15
>
> #See what they have for Rattus norvegicus, from NCBI and OrgDB
>
> query(ah, c("OrgDB", "NCBI", "Rattus norvegicus"))
AnnotationHub with 1 record
# snapshotDate(): 2016-08-15
# names(): AH49585
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Rattus norvegicus
# $rdataclass: OrgDb
# $title: org.Rn.eg.db.sqlite
# $description: NCBI gene ID based annotations about Rattus norvegicus
# $taxonomyid: 10116
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/pub/current_fasta
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation
# retrieve record with 'object[["AH49585"]]'
>
>
> ah[["AH49585"]]
loading from cache ‘C:/Users/drnevich/Documents/AppData/.AnnotationHub/56315’
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: RAT_DB
| ORGANISM: Rattus norvegicus
| SPECIES: Rat
| EGSOURCEDATE: 2015-Aug11
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 10116
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 20150808
| GOEGSOURCEDATE: 2015-Aug11
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Rattus norvegicus)
| GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn6
| GPSOURCEDATE: 2014-Aug1
| ENSOURCEDATE: 2015-Jul16
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Thu Aug 20 15:37:19 2015
Please see: help('select') for usage information
>
>
> #compare EGSOURCEDATE with org.Rn.eg.db:
>
> org.Rn.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: RAT_DB
| ORGANISM: Rattus norvegicus
| SPECIES: Rat
| EGSOURCEDATE: 2016-Mar14
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 10116
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 20160305
| GOEGSOURCEDATE: 2016-Mar14
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Rattus norvegicus)
| GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn6
| GPSOURCEDATE: 2014-Aug1
| ENSOURCEDATE: 2016-Mar9
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Wed Mar 23 15:52:15 2016
Please see: help('select') for usage information
>
>
> #Try mouse:
>
> query(ah, c("OrgDB", "NCBI", "Mus musculus"))
AnnotationHub with 1 record
# snapshotDate(): 2016-08-15
# names(): AH49583
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Mus musculus
# $rdataclass: OrgDb
# $title: org.Mm.eg.db.sqlite
# $description: NCBI gene ID based annotations about Mus musculus
# $taxonomyid: 10090
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/pub/current_fasta
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation
# retrieve record with 'object[["AH49583"]]'
>
>
> ah[["AH49583"]]
loading from cache ‘C:/Users/drnevich/Documents/AppData/.AnnotationHub/56313’
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: MOUSE_DB
| ORGANISM: Mus musculus
| SPECIES: Mouse
| EGSOURCEDATE: 2015-Aug11
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 10090
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 20150808
| GOEGSOURCEDATE: 2015-Aug11
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Mus musculus)
| GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/mm10
| GPSOURCEDATE: 2012-Mar8
| ENSOURCEDATE: 2015-Jul16
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Thu Aug 20 15:49:03 2015
Please see: help('select') for usage information
>
> #compare EGSOURCEDATE with org.Mm.eg.db:
>
> org.Mm.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: MOUSE_DB
| ORGANISM: Mus musculus
| SPECIES: Mouse
| EGSOURCEDATE: 2016-Mar14
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 10090
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 20160305
| GOEGSOURCEDATE: 2016-Mar14
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Mus musculus)
| GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/mm10
| GPSOURCEDATE: 2012-Mar8
| ENSOURCEDATE: 2016-Mar9
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Wed Mar 23 15:59:16 2016
Please see: help('select') for usage information
>
>
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Mm.eg.db_3.3.0 org.Rn.eg.db_3.3.0 AnnotationDbi_1.34.4 IRanges_2.6.1
[5] S4Vectors_0.10.3 Biobase_2.32.0 AnnotationHub_2.4.2 BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 digest_0.6.10
[3] mime_0.5 R6_2.1.3
[5] xtable_1.8-2 DBI_0.5
[7] RSQLite_1.0.0 BiocInstaller_1.22.3
[9] httr_1.2.1 curl_1.2
[11] tools_3.3.1 shiny_0.13.2
[13] httpuv_1.3.3 htmltools_0.3.5
[15] interactiveDisplayBase_1.10.3

The standard organism OrgDbs in our repo
http://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb
are comprised of data downloaded from multiple locations, UCSC, NCBI, Ensembl, etc. The other non-standard organism OrgDbs in AnnotationHub are made with makeOrgPackageFromNCBI() which downloads from
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
ftp://ftp.geneontology.org/pub/go/godata
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping
Valerie