Search
Question: Easy way to turn OrgDb object from AnnotationHub into package?
0
gravatar for Jenny Drnevich
19 months ago by
Jenny Drnevich1.9k
United States
Jenny Drnevich1.9k wrote:

Hi all,

Is there an easy way to turn an OrgDb object from AnnotationHub into a package? While they can be used with the nifty select(), keytypes(), etc. accessor functions, and some functions like goana() can use them fine, other functions that attempt to load it as a package  end up throwing errors. I did some searching and there are brief mentions of this issue here (A: Error in makeOrgPackageFromNCBI for Medicago truncatula) and here (how to use "non-standard" species for KEGG / GO analysis in limma?) but no answers . Is the best answer currently to use AnnotationForge and makeOrgPackageFromNCBI()?

Thanks,

Jenny

 

> library(AnnotationHub)
Loading required package: BiocGenerics
Loading required package: parallel
#lines removed

> library(pathview)
Loading required package: org.Hs.eg.db
Loading required package: AnnotationDbi
#lines removed

> ah <- AnnotationHub()
snapshotDate(): 2016-10-11
> query(ah, "Nannospalax")
AnnotationHub with 1 record
# snapshotDate(): 2016-10-11 
# names(): AH52167
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Nannospalax galili
# $rdataclass: OrgDb
# $title: org.Nannospalax_galili.eg.sqlite
# $description: NCBI gene ID based annotations about Nannospalax g...
# $taxonomyid: 1026970
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uni...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH52167"]]' 
> org.Ng.eg.db <- ah[["AH52167"]]
loading from cache ‘C:/Users/drnevich/Documents/AppData/.AnnotationHub/58905’
Warning message:
vfs customization not available on this platform. Ignoring value: vfs = unix-none 
> data(korg)
> #Need to add spalax to pathview's korg database cause it's not in for some reason
> korg <- rbind(korg, c("ngi","Nannospalax galili", "spalax", "1", "103724393","103724393"))
> pathview(gene.data = keys(org.Ng.eg.db, keytype = "ENTREZID")[1:1000], 
+          pathway.id = "04080", kegg.dir = "BasePathwayMaps",
+          species = "ngi", out.suffix = "test", kegg.native = T,
+          same.layer = F, gene.annotpkg = org.Ng.eg.db)
Info: Downloading xml files for ngi04080, 1/1 pathways..
Info: Downloading png files for ngi04080, 1/1 pathways..
Error in !pkg.on : invalid argument type
In addition: Warning message:
In is.na(gene.annotpkg) :
  is.na() applied to non-(list or vector) of type 'S4'
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
[1] pathview_1.14.0      org.Hs.eg.db_3.4.0   AnnotationDbi_1.36.1
[4] IRanges_2.8.1        S4Vectors_0.12.1     Biobase_2.34.0      
[7] AnnotationHub_2.6.4  BiocGenerics_0.20.0 

loaded via a namespace (and not attached):
 [1] graph_1.52.0                  Rcpp_0.12.9                  
 [3] KEGGgraph_1.32.0              XVector_0.14.0               
 [5] zlibbioc_1.20.0               xtable_1.8-2                 
 [7] R6_2.2.0                      httr_1.2.1                   
 [9] tools_3.3.2                   grid_3.3.2                   
[11] png_0.1-7                     DBI_0.5-1                    
[13] htmltools_0.3.5               yaml_2.1.14                  
[15] digest_0.6.12                 interactiveDisplayBase_1.12.0
[17] shiny_1.0.0                   Rgraphviz_2.18.0             
[19] curl_2.3                      KEGGREST_1.14.0              
[21] memoise_1.0.0                 RSQLite_1.1-2                
[23] mime_0.5                      BiocInstaller_1.24.0         
[25] Biostrings_2.42.1             XML_3.98-1.5                 
[27] httpuv_1.3.3   
ADD COMMENTlink modified 19 months ago by Valerie Obenchain ♦♦ 6.6k • written 19 months ago by Jenny Drnevich1.9k
0
gravatar for James W. MacDonald
19 months ago by
United States
James W. MacDonald47k wrote:

First move the cached db somewhere writeable and rename, then build and install

> library(AnnotationForge)
> file.copy(AnnotationHub::cache(hub["AH52167"]), "./org.Ng.eg.sqlite")
loading from cache 'C:/Users/jmacdon/AppData/Roaming/AppData/.AnnotationHub/58905'
[1] TRUE

> seed <- new("AnnDbPkgSeed", Package = "org.Ng.eg.db", Version = "0.0.1",Author = "James W. MacDonald", Maintainer = "James W. MacDonald <jmacdon@uw.edu>", PkgTemplate = "NOSCHEMA.DB", AnnObjPrefix = "org.Ng.eg", organism = "Nannospalax galili", species = "Nannospalax galili", biocViews = "annotation", manufacturerUrl = "none", manufacturer = "none", chipName = "none")
> makeAnnDbPkg(seed, "org.Ng.eg.sqlite")

> install.packages("org.Ng.eg.db/", type = "source", repos = NULL)
Installing package into 'C:/Users/jmacdon/AppData/Roaming/R/win-library/3.3'
(as 'lib' is unspecified)
* installing *source* package 'org.Ng.eg.db' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
*** arch - x64
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
Warning: vfs customization not available on this platform. Ignoring value: vfs = unix-none
* DONE (org.Ng.eg.db)
ADD COMMENTlink written 19 months ago by James W. MacDonald47k

Thanks, Jim - it worked!

ADD REPLYlink written 19 months ago by Jenny Drnevich1.9k

Hello Jim,

I have species Plasmopara halstedii which I am interested to make package i.e. org.Ph.eg.db  from Annotationhub but got an error such as

> ah <- AnnotationHub()
snapshotDate(): 2017-10-27
> query(ah, "halstedii")
AnnotationHub with 0 records
# snapshotDate(): 2017-10-27 
Warning message:
call dbDisconnect() when finished working with a connection 

Suggest way-out.

ADD REPLYlink written 29 days ago by sbbinfo900

Hello Jim,

I have species Plasmopara halstedii which I am interested to make package i.e. org.Ph.eg.db  from Annotationhub but got an error such as

> ah <- AnnotationHub()
snapshotDate(): 2017-10-27
> query(ah, "halstedii")
AnnotationHub with 0 records
# snapshotDate(): 2017-10-27 
Warning message:
call dbDisconnect() when finished working with a connection 

Suggest way-out.

ADD REPLYlink written 29 days ago by sbbinfo900

You got a warning, because evidently you are using an old R/Bioc installation. I don't get the warning:

> hub <- AnnotationHub()
snapshotDate(): 2018-04-30
> query(hub, c("plasmopara","orgdb"))
AnnotationHub with 0 records
# snapshotDate(): 2018-04-30

But the fact that I get zero records indicates that there isn't an OrgDb on AnnotationHub for this species. In addition, NCBI says there are only 19 genes for this virus, and about are just partial cds. So not a well annotated organism, so far as NCBI is concerned.

 

ADD REPLYlink written 29 days ago by James W. MacDonald47k

Thanks!

So if I have transcriptomic data of this species and using its genome and annotation as reference would make generate errors due to partial cds?

 

Please let me know the solution for reference genome of plasmopara halstedii  

 

ADD REPLYlink written 28 days ago by sbbinfo900

There are different levels of annotation... NCBI does have a genome for Plasmopara halstedii and it has 15,469 predicted proteins (https://www.ncbi.nlm.nih.gov/genome/?term=Plasmopara+halstedii). So there should be the locations for the exons making up these proteins in the .gff file, which is one level of annotation.  If you click on the "protein count: 15469" link (https://www.ncbi.nlm.nih.gov/genome/proteins/42828?genome_assembly_id=263864) there is some information on what these proteins are, so that level of annotation is also available, to a small degree. What is not available is sequenced cDNAs, although 15,469 genes in a virus seem way too high - I have no idea what is going on! Regardless, an OrgDb package for this species is not available through AnnotationHub. Hopefully the protein names are contained in the gff file and you can pull them out from there.

ADD REPLYlink written 28 days ago by Jenny Drnevich1.9k

Hello Jim,

AnnotationForge has created database of P.halstedii from NCBI successfully with latest R 3.5.1, biomaRt, GenomeInfoDB libraries.

Thanks for help!

ADD REPLYlink written 27 days ago by sbbinfo900
0
gravatar for Valerie Obenchain
19 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Hi Jenny,

The OrgDb object is essentially the sqlite file and you can make a package from the sqlite with AnnotationForge::makeAnnDbPkg(). See the man page example for creating a 'seed' then giving the seed and the path to the sqlite to makeAnnDbPkg(). As the PkgTemplate arg use NCBIORG.DB. See all templates in AnnotationForge/inst/extdata/GentlemanLab/ANNDBPKG-INDEX.TXT.

Valerie

ADD COMMENTlink written 19 months ago by Valerie Obenchain ♦♦ 6.6k

Hi Valarie,

Setting PkgTemplate = "NCBIORG.DB" led to the following error:

> seed <- new("AnnDbPkgSeed", Package = "org.Ng.eg.db", 
+             Version = "0.0.1", Author = "Jenny Drnevich", 
+             Maintainer = "Jenny Drnevich <drnevich@illinois.edu>", 
+             PkgTemplate = "NCBIORG.DB", AnnObjPrefix = "org.Ng.eg", 
+             organism = "Nannospalax galili", species = "Nannospalax galili", 
+             biocViews = "annotation", manufacturerUrl = "none", manufacturer = "none", chipName = "none")
> 
> makeAnnDbPkg(seed, "org.Ng.eg.sqlite")
Error in if (species == "Anopheles gambiae") { : 
  argument is of length zero

However, using Jim's suggested PkgTemplate = "NOSCHEMA.DB" seemed to work fine and now I can use pathview. Does this matter at all?

Thanks!

Jenny

ADD REPLYlink modified 19 months ago • written 19 months ago by Jenny Drnevich1.9k

You have to use the schema that matches the DB you downloaded:

> library(RSQLite)
> con <- dbConnect(SQLite(), "org.Ng.eg.sqlite")
> dbGetQuery(con, "select * from metadata;")
                name              value
1    DBSCHEMAVERSION                2.1
2           DBSCHEMA        NOSCHEMA_DB
3           ORGANISM Nannospalax galili
4            SPECIES Nannospalax galili
5          CENTRALID                GID
6        Taxonomy ID            1026970
7            Db type              OrgDb
8 Supporting package      AnnotationDbi

 

ADD REPLYlink written 19 months ago by James W. MacDonald47k

Right. Thanks Jim.

Val

ADD REPLYlink written 19 months ago by Valerie Obenchain ♦♦ 6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 318 users visited in the last hour