Problem making orgdb package for bacteria (Pseudomonas) using annotation hub and annotation forge
1
0
Entering edit mode
Manmohit • 0
@7ffaf7cf
Last seen 8 weeks ago
United States

I am trying to make orgdb package for my bacteria (Pseudomonas aeruginosa PAO1) but when I am trying to run makeAnnDbPkg command it gives me error which is difficult for me as a beginner to resolve

hub <- AnnotationHub()
query(hub, c("pseudomonas"))
org.PAO1.eg.db <- hub[["AH91625"]]
file.copy(AnnotationHub::cache(ah3["AH91625"]),"./org.Pseudomonas aeruginosa PAO1.eg.sqlite")
seed <- new("AnnDbPkgSeed", Package="./org.Pseudomonas aeruginosa PAO1.eg.db", Version="0.0.1", PkgTemplate="NCBIORG.DB",AnnObjPrefix="org.Pseudomonas aeruginosa PAO1.eg",organism="Pseudomonas aeruginosa",species="Pseudomonas aeruginoa PAO1")

# include your problematic code here with any corresponding output 

makeAnnDbPkg(seed,"./org.Pseudomonas aeruginosa PAO1.eg.sqlite")

Error in initWithDbMetada(x, dbfile) : 
  "metadata" table has unexpected col names


sessionInfo( )
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] shiny_1.7.3                 RSQLite_2.2.19              ggplot2_3.4.0              
 [4] DESeq2_1.36.0               SummarizedExperiment_1.26.1 MatrixGenerics_1.8.1       
 [7] matrixStats_0.63.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[10] MeSHDbi_1.32.0              AnnotationHub_3.4.0         BiocFileCache_2.4.0        
[13] dbplyr_2.2.1                AnnotationForge_1.38.1      AnnotationDbi_1.58.0       
[16] IRanges_2.30.1              S4Vectors_0.34.0            edgeR_3.38.4               
[19] limma_3.52.4                NOISeq_2.40.0               Matrix_1.5-3               
[22] Biobase_2.56.0              BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                  bit64_4.0.5                  
 [3] filelock_1.0.2                RColorBrewer_1.1-3           
 [5] httr_1.4.4                    bslib_0.4.1                  
 [7] tools_4.2.1                   DT_0.26                      
 [9] utf8_1.2.2                    R6_2.5.1                     
[11] colorspace_2.0-3              DBI_1.1.3                    
[13] withr_2.5.0                   tidyselect_1.2.0             
[15] bit_4.0.5                     curl_4.3.3                   
[17] compiler_4.2.1                cli_3.4.1                    
[19] DelayedArray_0.22.0           sass_0.4.4                   
[21] scales_1.2.1                  genefilter_1.78.0            
[23] rappdirs_0.3.3                digest_0.6.30                
[25] rmarkdown_2.18                XVector_0.36.0               
[27] pkgconfig_2.0.3               htmltools_0.5.3              
[29] fastmap_1.1.0                 htmlwidgets_1.5.4            
[31] rlang_1.0.6                   rstudioapi_0.14              
[33] jquerylib_0.1.4               generics_0.1.3               
[35] jsonlite_1.8.3                crosstalk_1.2.0              
[37] BiocParallel_1.30.4           dplyr_1.0.10                 
[39] RCurl_1.98-1.9                magrittr_2.0.3               
[41] GenomeInfoDbData_1.2.8        munsell_0.5.0                
[43] Rcpp_1.0.9                    fansi_1.0.3                  
[45] lifecycle_1.0.3               yaml_2.3.6                   
[47] zlibbioc_1.42.0               grid_4.2.1                   
[49] blob_1.2.3                    parallel_4.2.1               
[51] promises_1.2.0.1              crayon_1.5.2                 
[53] lattice_0.20-45               Biostrings_2.64.1            
[55] annotate_1.74.0               KEGGREST_1.36.3              
[57] locfit_1.5-9.6                knitr_1.41                   
[59] pillar_1.8.1                  geneplotter_1.74.0           
[61] codetools_0.2-18              XML_3.99-0.13                
[63] glue_1.6.2                    BiocVersion_3.15.2           
[65] evaluate_0.18                 BiocManager_1.30.19          
[67] png_0.1-8                     vctrs_0.5.1                  
[69] httpuv_1.6.6                  gtable_0.3.1                 
[71] purrr_0.3.5                   assertthat_0.2.1             
[73] cachem_1.0.6                  xfun_0.35                    
[75] mime_0.12                     xtable_1.8-4                 
[77] later_1.3.0                   survival_3.4-0               
[79] tibble_3.1.8                  memoise_2.0.1                
[81] ellipsis_0.3.2                interactiveDisplayBase_1.34.0
AnnotationHub AnnotationForge Pseudomonas_aeruginosa • 397 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 42 minutes ago
United States

Can you explain why you want to do that? You could just use AnnotationHub directly (it gets cached, so one download), or you could use saveDb from the AnnotationForge package if you really don't want to use the cached version.

0
Entering edit mode

Hii James Thankyou for your reply. I tried using Cached version for my enrichment analysis but I get error, thats why I tried using Annotation forge for saving it from cached version. Here is the error which I get when I try to use cached version. Please suggest

hub <- AnnotationHub()
snapshotDate(): 2022-04-25
> query(hub,c("Pseudomonas"))
AnnotationHub with 6 records
# snapshotDate(): 2022-04-25
# $dataprovider: NCBI,DBCLS, PathBank, Inparanoid8
# $species: Pseudomonas aeruginosa PAO1, Pseudomonas aeruginosa
# $rdataclass: SQLiteFile, Tibble, Inparanoid8Db
# additional mcols(): taxonomyid, genome, description, coordinate_1_based,
#   maintainer, rdatadateadded, preparerclass, tags, rdatapath, sourceurl,
#   sourcetype 
# retrieve records with, e.g., 'object[["AH10565"]]' 

             title                                          
  AH10565  | hom.Pseudomonas_aeruginosa.inp8.sqlite         
  AH87076  | pathbank_Pseudomonas_aeruginosa_metabolites.rda
  AH87086  | pathbank_Pseudomonas_aeruginosa_proteins.rda   
  AH91625  | MeSHDb for Pseudomonas aeruginosa PAO1 (v001)  
  AH97892  | MeSHDb for Pseudomonas aeruginosa PAO1 (v002)  
  AH100357 | MeSHDb for Pseudomonas aeruginosa PAO1 (v003)  
> PAO1 <- hub[["AH91625"]]
loading from cache
> PAO1
                                                                        AH91625 
"/Users/Sauer/Library/Caches/org.R-project.R/R/AnnotationHub/3f66b1d23d6_98371" 
library(clusterProfiler)
> ego <- enrichGO(gene=signif_genes, universe = all_genes, keyType="ENSEMBL",OrgDb=PAO1,ont="BP",pAdjustMethod="BH",qvalueCutoff=0.05,readable=TRUE)
Error in loadNamespace(name) : 
  there is no package called ‘/Users/Sauer/Library/Caches/org.R-project.R/R/AnnotationHub/3f66b1d23d6_98371’
ADD REPLY
0
Entering edit mode

There are two issues here. First, a MeSHDb isn't an OrgDb. The distinction being that a MeSHDb is meant to provide links between NCBI Gene IDs and MeSH IDs, so that's not the right package. Second, you are getting the cache location instead of a connection to the file. I don't know why that is, but it's academic in this situation, as you are trying to use the wrong thing. Unfortunately there doesn't appear to be an OrgDb for P. aeruginosa.

> query(hub, c("pseudomonas", "orgdb"))
AnnotationHub with 0 records
# snapshotDate(): 2022-10-26

You could try to make your own OrgDb using makeOrgPackageFromNCBI, which is in the AnnotationForge package.

ADD REPLY
0
Entering edit mode

Hii James Thanks again for correcting me and as suggested I tried making my own Orgdb from NCBI and I get this error when I tried this. Below is my code

makeOrgPackageFromNCBI(versio="0.1", author="Manmohit Kalia <mkalia@binghamton.edu", maintainer="Manmohit Kalia <mkalia@binghamton.edu", outputDir=".",tax_id="208964 ",genus="Pseudomonas", species="aeruginosa")
If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
extracting data for our organism from : gene_info
getting data for gene2go.gz
rebuilding the cache
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in download.file(url, dest, quiet = TRUE) : 
  download from 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz' failed
In addition: Warning messages:
1: In download.file(url, dest, quiet = TRUE) :
  downloaded length 2893719956 != reported length 11377358957
2: In download.file(url, dest, quiet = TRUE) :
  URL 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz': Timeout of 1000 seconds was reached

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] clusterProfiler_4.4.4       shiny_1.7.3                 RSQLite_2.2.19             
 [4] ggplot2_3.4.0               DESeq2_1.36.0               SummarizedExperiment_1.26.1
 [7] MatrixGenerics_1.8.1        matrixStats_0.63.0          GenomicRanges_1.48.0       
[10] GenomeInfoDb_1.32.4         MeSHDbi_1.32.0              AnnotationHub_3.4.0        
[13] BiocFileCache_2.4.0         dbplyr_2.2.1                AnnotationForge_1.38.1     
[16] AnnotationDbi_1.58.0        IRanges_2.30.1              S4Vectors_0.34.0           
[19] edgeR_3.38.4                limma_3.52.4                NOISeq_2.40.0              
[22] Matrix_1.5-3                Biobase_2.56.0              BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
  [1] shadowtext_0.1.2              fastmatch_1.1-3              
  [3] plyr_1.8.8                    igraph_1.3.5                 
  [5] lazyeval_0.2.2                BiocParallel_1.30.4          
  [7] crosstalk_1.2.0               digest_0.6.30                
  [9] yulab.utils_0.0.5             htmltools_0.5.3              
 [11] GOSemSim_2.22.0               viridis_0.6.2                
 [13] GO.db_3.15.0                  fansi_1.0.3                  
 [15] magrittr_2.0.3                memoise_2.0.1                
 [17] Biostrings_2.64.1             annotate_1.74.0              
 [19] graphlayouts_0.8.4            enrichplot_1.16.2            
 [21] colorspace_2.0-3              blob_1.2.3                   
 [23] rappdirs_0.3.3                ggrepel_0.9.2                
 [25] xfun_0.35                     dplyr_1.0.10                 
 [27] crayon_1.5.2                  RCurl_1.98-1.9               
 [29] jsonlite_1.8.3                scatterpie_0.1.8             
 [31] genefilter_1.78.0             ape_5.6-2                    
 [33] survival_3.4-0                glue_1.6.2                   
 [35] polyclip_1.10-4               gtable_0.3.1                 
 [37] zlibbioc_1.42.0               XVector_0.36.0               
 [39] DelayedArray_0.22.0           scales_1.2.1                 
 [41] DOSE_3.22.1                   DBI_1.1.3                    
 [43] Rcpp_1.0.9                    viridisLite_0.4.1            
 [45] xtable_1.8-4                  tidytree_0.4.1               
 [47] gridGraphics_0.5-1            bit_4.0.5                    
 [49] DT_0.26                       htmlwidgets_1.5.4            
 [51] httr_1.4.4                    fgsea_1.22.0                 
 [53] RColorBrewer_1.1-3            ellipsis_0.3.2               
 [55] pkgconfig_2.0.3               XML_3.99-0.13                
 [57] farver_2.1.1                  sass_0.4.4                   
 [59] locfit_1.5-9.6                utf8_1.2.2                   
 [61] ggplotify_0.1.0               tidyselect_1.2.0             
 [63] rlang_1.0.6                   reshape2_1.4.4               
 [65] later_1.3.0                   munsell_0.5.0                
 [67] BiocVersion_3.15.2            tools_4.2.1                  
 [69] cachem_1.0.6                  downloader_0.4               
 [71] cli_3.4.1                     generics_0.1.3               
 [73] evaluate_0.18                 stringr_1.5.0                
 [75] fastmap_1.1.0                 yaml_2.3.6                   
 [77] ggtree_3.4.4                  knitr_1.41                   
 [79] bit64_4.0.5                   tidygraph_1.2.2              
 [81] purrr_0.3.5                   KEGGREST_1.36.3              
 [83] ggraph_2.1.0                  nlme_3.1-160                 
 [85] mime_0.12                     aplot_0.1.9                  
 [87] DO.db_2.9                     compiler_4.2.1               
 [89] rstudioapi_0.14               filelock_1.0.2               
 [91] curl_4.3.3                    png_0.1-8                    
 [93] interactiveDisplayBase_1.34.0 treeio_1.20.2                
 [95] tibble_3.1.8                  tweenr_2.0.2                 
 [97] geneplotter_1.74.0            bslib_0.4.1                  
 [99] stringi_1.7.8                 lattice_0.20-45              
[101] vctrs_0.5.1                   pillar_1.8.1                 
[103] lifecycle_1.0.3               BiocManager_1.30.19          
[105] jquerylib_0.1.4               data.table_1.14.6            
[107] bitops_1.0-7                  patchwork_1.1.2              
[109] httpuv_1.6.6                  qvalue_2.28.0                
[111] R6_2.5.1                      promises_1.2.0.1             
[113] gridExtra_2.3                 codetools_0.2-18             
[115] MASS_7.3-58.1                 assertthat_0.2.1             
[117] withr_2.5.0                   GenomeInfoDbData_1.2.8       
[119] parallel_4.2.1                ggfun_0.0.9                  
[121] grid_4.2.1                    tidyr_1.2.1                  
[123] rmarkdown_2.18                ggforce_0.4.1

Will you please take a look and suggest something

ADD REPLY
0
Entering edit mode

It's taking too long to download the alternative GO data. Before you run the function, do

options(timeout = 5000)

Also, when you post code, please put a triple-backtick (the upper left key on a QWERTY keyboard) in the line before and after your code.

ADD REPLY
0
Entering edit mode

Thanks James . I will do that next time

ADD REPLY
0
Entering edit mode

Hii James After I increased the timeout. It still failed. Here is the error that i get after running my code.

> options(timeout = 5000)
> makeOrgPackageFromNCBI(versio="0.1", author="Manmohit Kalia <mkalia@binghamton.edu", maintainer="Manmohit Kalia <mkalia@binghamton.edu", outputDir=".",tax_id="208964 ",genus="Pseudomonas", species="aeruginosa",NCBIFilesDir = ".",databaseOnly = FALSE,useDeprecatedStyle = FALSE,rebuildCache = TRUE,verbose = TRUE)
If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in download.file(url, dest, quiet = TRUE) : 
  download from 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz' failed
In addition: Warning messages:
1: In download.file(url, dest, quiet = TRUE) :
  downloaded length 5525519956 != reported length 11377358957
2: In download.file(url, dest, quiet = TRUE) :
  URL 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz': status was 'Failure when receiving data from the peer'

will you please check

ADD REPLY
0
Entering edit mode

You could download the file from expasy.org directly, put it in your working directory, and use rebuildCache = FALSE in your call to makeOrgPackageFromNCBI. I would normally use either wget or curl for the download.

ADD REPLY
0
Entering edit mode

Thanks James for suggestion. This worked

ADD REPLY
0
Entering edit mode

WHen I am trying to install the package using BiocManager::Install it gives me error

BiocManager::install("./org.Paeruginosa.eg.db", character.only = TRUE)
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for
details

replacement repositories:
    CRAN: https://cran.rstudio.com/

Bioconductor version 3.15 (BiocManager 1.30.19), R 4.2.1 (2022-06-23)
Installing github package(s) './org.Paeruginosa.eg.db'
Error: package 'remotes' not installed in library path(s)
    /Library/Frameworks/R.framework/Versions/4.2/Resources/library
install with 'BiocManager::install("remotes")'

Am i doing anything wrong , I want to use it for use with clusterprofiler

ADD REPLY
0
Entering edit mode
install.packages("./org.Paeruginosa.eg.db", repos = NULL)
ADD REPLY

Login before adding your answer.

Traffic: 709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6