Easy way to turn OrgDb object from AnnotationHub into package?
2
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 5 months ago
United States

Hi all,

Is there an easy way to turn an OrgDb object from AnnotationHub into a package? While they can be used with the nifty select(), keytypes(), etc. accessor functions, and some functions like goana() can use them fine, other functions that attempt to load it as a package  end up throwing errors. I did some searching and there are brief mentions of this issue here (A: Error in makeOrgPackageFromNCBI for Medicago truncatula) and here (how to use "non-standard" species for KEGG / GO analysis in limma?) but no answers . Is the best answer currently to use AnnotationForge and makeOrgPackageFromNCBI()?

Thanks,

Jenny

 

> library(AnnotationHub)
Loading required package: BiocGenerics
Loading required package: parallel
#lines removed

> library(pathview)
Loading required package: org.Hs.eg.db
Loading required package: AnnotationDbi
#lines removed

> ah <- AnnotationHub()
snapshotDate(): 2016-10-11
> query(ah, "Nannospalax")
AnnotationHub with 1 record
# snapshotDate(): 2016-10-11 
# names(): AH52167
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Nannospalax galili
# $rdataclass: OrgDb
# $title: org.Nannospalax_galili.eg.sqlite
# $description: NCBI gene ID based annotations about Nannospalax g...
# $taxonomyid: 1026970
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uni...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH52167"]]' 
> org.Ng.eg.db <- ah[["AH52167"]]
loading from cache ‘C:/Users/drnevich/Documents/AppData/.AnnotationHub/58905’
Warning message:
vfs customization not available on this platform. Ignoring value: vfs = unix-none 
> data(korg)
> #Need to add spalax to pathview's korg database cause it's not in for some reason
> korg <- rbind(korg, c("ngi","Nannospalax galili", "spalax", "1", "103724393","103724393"))
> pathview(gene.data = keys(org.Ng.eg.db, keytype = "ENTREZID")[1:1000], 
+          pathway.id = "04080", kegg.dir = "BasePathwayMaps",
+          species = "ngi", out.suffix = "test", kegg.native = T,
+          same.layer = F, gene.annotpkg = org.Ng.eg.db)
Info: Downloading xml files for ngi04080, 1/1 pathways..
Info: Downloading png files for ngi04080, 1/1 pathways..
Error in !pkg.on : invalid argument type
In addition: Warning message:
In is.na(gene.annotpkg) :
  is.na() applied to non-(list or vector) of type 'S4'
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
[1] pathview_1.14.0      org.Hs.eg.db_3.4.0   AnnotationDbi_1.36.1
[4] IRanges_2.8.1        S4Vectors_0.12.1     Biobase_2.34.0      
[7] AnnotationHub_2.6.4  BiocGenerics_0.20.0 

loaded via a namespace (and not attached):
 [1] graph_1.52.0                  Rcpp_0.12.9                  
 [3] KEGGgraph_1.32.0              XVector_0.14.0               
 [5] zlibbioc_1.20.0               xtable_1.8-2                 
 [7] R6_2.2.0                      httr_1.2.1                   
 [9] tools_3.3.2                   grid_3.3.2                   
[11] png_0.1-7                     DBI_0.5-1                    
[13] htmltools_0.3.5               yaml_2.1.14                  
[15] digest_0.6.12                 interactiveDisplayBase_1.12.0
[17] shiny_1.0.0                   Rgraphviz_2.18.0             
[19] curl_2.3                      KEGGREST_1.14.0              
[21] memoise_1.0.0                 RSQLite_1.1-2                
[23] mime_0.5                      BiocInstaller_1.24.0         
[25] Biostrings_2.42.1             XML_3.98-1.5                 
[27] httpuv_1.3.3   
AnnotationHub AnnotationDbi AnnotationForge pathview • 2.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 days ago
United States

First move the cached db somewhere writeable and rename, then build and install

> library(AnnotationForge)
> file.copy(AnnotationHub::cache(hub["AH52167"]), "./org.Ng.eg.sqlite")
loading from cache 'C:/Users/jmacdon/AppData/Roaming/AppData/.AnnotationHub/58905'
[1] TRUE

> seed <- new("AnnDbPkgSeed", Package = "org.Ng.eg.db", Version = "0.0.1",Author = "James W. MacDonald", Maintainer = "James W. MacDonald <jmacdon@uw.edu>", PkgTemplate = "NOSCHEMA.DB", AnnObjPrefix = "org.Ng.eg", organism = "Nannospalax galili", species = "Nannospalax galili", biocViews = "annotation", manufacturerUrl = "none", manufacturer = "none", chipName = "none")
> makeAnnDbPkg(seed, "org.Ng.eg.sqlite")

> install.packages("org.Ng.eg.db/", type = "source", repos = NULL)
Installing package into 'C:/Users/jmacdon/AppData/Roaming/R/win-library/3.3'
(as 'lib' is unspecified)
* installing *source* package 'org.Ng.eg.db' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
*** arch - x64
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
* DONE (org.Ng.eg.db)
ADD COMMENT
0
Entering edit mode

Thanks, Jim - it worked!

ADD REPLY
0
Entering edit mode

Hello Jim,

I have species Plasmopara halstedii which I am interested to make package i.e. org.Ph.eg.db  from Annotationhub but got an error such as

> ah <- AnnotationHub()
snapshotDate(): 2017-10-27
> query(ah, "halstedii")
AnnotationHub with 0 records
# snapshotDate(): 2017-10-27 
Warning message:
call dbDisconnect() when finished working with a connection 

Suggest way-out.

ADD REPLY
0
Entering edit mode

Hello Jim,

I have species Plasmopara halstedii which I am interested to make package i.e. org.Ph.eg.db  from Annotationhub but got an error such as

> ah <- AnnotationHub()
snapshotDate(): 2017-10-27
> query(ah, "halstedii")
AnnotationHub with 0 records
# snapshotDate(): 2017-10-27 
Warning message:
call dbDisconnect() when finished working with a connection 

Suggest way-out.

ADD REPLY
0
Entering edit mode

You got a warning, because evidently you are using an old R/Bioc installation. I don't get the warning:

> hub <- AnnotationHub()
snapshotDate(): 2018-04-30
> query(hub, c("plasmopara","orgdb"))
AnnotationHub with 0 records
# snapshotDate(): 2018-04-30

But the fact that I get zero records indicates that there isn't an OrgDb on AnnotationHub for this species. In addition, NCBI says there are only 19 genes for this virus, and about are just partial cds. So not a well annotated organism, so far as NCBI is concerned.

 

ADD REPLY
0
Entering edit mode

Thanks!

So if I have transcriptomic data of this species and using its genome and annotation as reference would make generate errors due to partial cds?

 

Please let me know the solution for reference genome of plasmopara halstedii  

 

ADD REPLY
0
Entering edit mode

There are different levels of annotation... NCBI does have a genome for Plasmopara halstedii and it has 15,469 predicted proteins (https://www.ncbi.nlm.nih.gov/genome/?term=Plasmopara+halstedii). So there should be the locations for the exons making up these proteins in the .gff file, which is one level of annotation.  If you click on the "protein count: 15469" link (https://www.ncbi.nlm.nih.gov/genome/proteins/42828?genome_assembly_id=263864) there is some information on what these proteins are, so that level of annotation is also available, to a small degree. What is not available is sequenced cDNAs, although 15,469 genes in a virus seem way too high - I have no idea what is going on! Regardless, an OrgDb package for this species is not available through AnnotationHub. Hopefully the protein names are contained in the gff file and you can pull them out from there.

ADD REPLY
0
Entering edit mode

Hello Jim,

AnnotationForge has created database of P.halstedii from NCBI successfully with latest R 3.5.1, biomaRt, GenomeInfoDB libraries.

Thanks for help!

ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.0 years ago
United States

Hi Jenny,

The OrgDb object is essentially the sqlite file and you can make a package from the sqlite with AnnotationForge::makeAnnDbPkg(). See the man page example for creating a 'seed' then giving the seed and the path to the sqlite to makeAnnDbPkg(). As the PkgTemplate arg use NCBIORG.DB. See all templates in AnnotationForge/inst/extdata/GentlemanLab/ANNDBPKG-INDEX.TXT.

Valerie

ADD COMMENT
0
Entering edit mode

Hi Valarie,

Setting PkgTemplate = "NCBIORG.DB" led to the following error:

> seed <- new("AnnDbPkgSeed", Package = "org.Ng.eg.db", 
+             Version = "0.0.1", Author = "Jenny Drnevich", 
+             Maintainer = "Jenny Drnevich <drnevich@illinois.edu>", 
+             PkgTemplate = "NCBIORG.DB", AnnObjPrefix = "org.Ng.eg", 
+             organism = "Nannospalax galili", species = "Nannospalax galili", 
+             biocViews = "annotation", manufacturerUrl = "none", manufacturer = "none", chipName = "none")
> 
> makeAnnDbPkg(seed, "org.Ng.eg.sqlite")
Error in if (species == "Anopheles gambiae") { : 
  argument is of length zero

However, using Jim's suggested PkgTemplate = "NOSCHEMA.DB" seemed to work fine and now I can use pathview. Does this matter at all?

Thanks!

Jenny

ADD REPLY
0
Entering edit mode

You have to use the schema that matches the DB you downloaded:

> library(RSQLite)
> con <- dbConnect(SQLite(), "org.Ng.eg.sqlite")
> dbGetQuery(con, "select * from metadata;")
                name              value
1    DBSCHEMAVERSION                2.1
2           DBSCHEMA        NOSCHEMA_DB
3           ORGANISM Nannospalax galili
4            SPECIES Nannospalax galili
5          CENTRALID                GID
6        Taxonomy ID            1026970
7            Db type              OrgDb
8 Supporting package      AnnotationDbi

 

ADD REPLY
0
Entering edit mode

Right. Thanks Jim.

Val

ADD REPLY

Login before adding your answer.

Traffic: 300 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6