setwd()

Question

Problem making orgdb package for bacteria (Pseudomonas) using annotation hub and annotation forge

1

Entering edit mode

Manmohit ▴ 10

@7ffaf7cf

Last seen 18 months ago

United States

I am trying to make orgdb package for my bacteria (Pseudomonas aeruginosa PAO1) but when I am trying to run makeAnnDbPkg command it gives me error which is difficult for me as a beginner to resolve

hub <- AnnotationHub()
query(hub, c("pseudomonas"))
org.PAO1.eg.db <- hub[["AH91625"]]
file.copy(AnnotationHub::cache(ah3["AH91625"]),"./org.Pseudomonas aeruginosa PAO1.eg.sqlite")
seed <- new("AnnDbPkgSeed", Package="./org.Pseudomonas aeruginosa PAO1.eg.db", Version="0.0.1", PkgTemplate="NCBIORG.DB",AnnObjPrefix="org.Pseudomonas aeruginosa PAO1.eg",organism="Pseudomonas aeruginosa",species="Pseudomonas aeruginoa PAO1")

# include your problematic code here with any corresponding output 

makeAnnDbPkg(seed,"./org.Pseudomonas aeruginosa PAO1.eg.sqlite")

Error in initWithDbMetada(x, dbfile) : 
  "metadata" table has unexpected col names


sessionInfo( )
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] shiny_1.7.3                 RSQLite_2.2.19              ggplot2_3.4.0              
 [4] DESeq2_1.36.0               SummarizedExperiment_1.26.1 MatrixGenerics_1.8.1       
 [7] matrixStats_0.63.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[10] MeSHDbi_1.32.0              AnnotationHub_3.4.0         BiocFileCache_2.4.0        
[13] dbplyr_2.2.1                AnnotationForge_1.38.1      AnnotationDbi_1.58.0       
[16] IRanges_2.30.1              S4Vectors_0.34.0            edgeR_3.38.4               
[19] limma_3.52.4                NOISeq_2.40.0               Matrix_1.5-3               
[22] Biobase_2.56.0              BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                  bit64_4.0.5                  
 [3] filelock_1.0.2                RColorBrewer_1.1-3           
 [5] httr_1.4.4                    bslib_0.4.1                  
 [7] tools_4.2.1                   DT_0.26                      
 [9] utf8_1.2.2                    R6_2.5.1                     
[11] colorspace_2.0-3              DBI_1.1.3                    
[13] withr_2.5.0                   tidyselect_1.2.0             
[15] bit_4.0.5                     curl_4.3.3                   
[17] compiler_4.2.1                cli_3.4.1                    
[19] DelayedArray_0.22.0           sass_0.4.4                   
[21] scales_1.2.1                  genefilter_1.78.0            
[23] rappdirs_0.3.3                digest_0.6.30                
[25] rmarkdown_2.18                XVector_0.36.0               
[27] pkgconfig_2.0.3               htmltools_0.5.3              
[29] fastmap_1.1.0                 htmlwidgets_1.5.4            
[31] rlang_1.0.6                   rstudioapi_0.14              
[33] jquerylib_0.1.4               generics_0.1.3               
[35] jsonlite_1.8.3                crosstalk_1.2.0              
[37] BiocParallel_1.30.4           dplyr_1.0.10                 
[39] RCurl_1.98-1.9                magrittr_2.0.3               
[41] GenomeInfoDbData_1.2.8        munsell_0.5.0                
[43] Rcpp_1.0.9                    fansi_1.0.3                  
[45] lifecycle_1.0.3               yaml_2.3.6                   
[47] zlibbioc_1.42.0               grid_4.2.1                   
[49] blob_1.2.3                    parallel_4.2.1               
[51] promises_1.2.0.1              crayon_1.5.2                 
[53] lattice_0.20-45               Biostrings_2.64.1            
[55] annotate_1.74.0               KEGGREST_1.36.3              
[57] locfit_1.5-9.6                knitr_1.41                   
[59] pillar_1.8.1                  geneplotter_1.74.0           
[61] codetools_0.2-18              XML_3.99-0.13                
[63] glue_1.6.2                    BiocVersion_3.15.2           
[65] evaluate_0.18                 BiocManager_1.30.19          
[67] png_0.1-8                     vctrs_0.5.1                  
[69] httpuv_1.6.6                  gtable_0.3.1                 
[71] purrr_0.3.5                   assertthat_0.2.1             
[73] cachem_1.0.6                  xfun_0.35                    
[75] mime_0.12                     xtable_1.8-4                 
[77] later_1.3.0                   survival_3.4-0               
[79] tibble_3.1.8                  memoise_2.0.1                
[81] ellipsis_0.3.2                interactiveDisplayBase_1.34.0

AnnotationHub AnnotationForge Pseudomonas_aeruginosa • 6.1k views

ADD COMMENT • link updated 16 months ago by James W. MacDonald 68k • written 2.3 years ago by Manmohit ▴ 10

shepherl · Answer 1 · 2022-12-07

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

Can you explain why you want to do that? You could just use AnnotationHub directly (it gets cached, so one download), or you could use saveDb from the AnnotationForge package if you really don't want to use the cached version.

ADD COMMENT • link 2.3 years ago James W. MacDonald 68k

0

Entering edit mode

Hii James Thankyou for your reply. I tried using Cached version for my enrichment analysis but I get error, thats why I tried using Annotation forge for saving it from cached version. Here is the error which I get when I try to use cached version. Please suggest

hub <- AnnotationHub()
snapshotDate(): 2022-04-25
> query(hub,c("Pseudomonas"))
AnnotationHub with 6 records
# snapshotDate(): 2022-04-25
# $dataprovider: NCBI,DBCLS, PathBank, Inparanoid8
# $species: Pseudomonas aeruginosa PAO1, Pseudomonas aeruginosa
# $rdataclass: SQLiteFile, Tibble, Inparanoid8Db
# additional mcols(): taxonomyid, genome, description, coordinate_1_based,
#   maintainer, rdatadateadded, preparerclass, tags, rdatapath, sourceurl,
#   sourcetype 
# retrieve records with, e.g., 'object[["AH10565"]]' 

             title                                          
  AH10565  | hom.Pseudomonas_aeruginosa.inp8.sqlite         
  AH87076  | pathbank_Pseudomonas_aeruginosa_metabolites.rda
  AH87086  | pathbank_Pseudomonas_aeruginosa_proteins.rda   
  AH91625  | MeSHDb for Pseudomonas aeruginosa PAO1 (v001)  
  AH97892  | MeSHDb for Pseudomonas aeruginosa PAO1 (v002)  
  AH100357 | MeSHDb for Pseudomonas aeruginosa PAO1 (v003)  
> PAO1 <- hub[["AH91625"]]
loading from cache
> PAO1
                                                                        AH91625 
"/Users/Sauer/Library/Caches/org.R-project.R/R/AnnotationHub/3f66b1d23d6_98371" 
library(clusterProfiler)
> ego <- enrichGO(gene=signif_genes, universe = all_genes, keyType="ENSEMBL",OrgDb=PAO1,ont="BP",pAdjustMethod="BH",qvalueCutoff=0.05,readable=TRUE)
Error in loadNamespace(name) : 
  there is no package called ‘/Users/Sauer/Library/Caches/org.R-project.R/R/AnnotationHub/3f66b1d23d6_98371’

ADD REPLY • link updated 2.3 years ago by shepherl 4.1k • written 2.3 years ago by Manmohit ▴ 10

0

Entering edit mode

There are two issues here. First, a MeSHDb isn't an OrgDb. The distinction being that a MeSHDb is meant to provide links between NCBI Gene IDs and MeSH IDs, so that's not the right package. Second, you are getting the cache location instead of a connection to the file. I don't know why that is, but it's academic in this situation, as you are trying to use the wrong thing. Unfortunately there doesn't appear to be an OrgDb for P. aeruginosa.

> query(hub, c("pseudomonas", "orgdb"))
AnnotationHub with 0 records
# snapshotDate(): 2022-10-26

You could try to make your own OrgDb using makeOrgPackageFromNCBI, which is in the AnnotationForge package.

ADD REPLY • link 2.3 years ago James W. MacDonald 68k

0

Entering edit mode

Hii James Thanks again for correcting me and as suggested I tried making my own Orgdb from NCBI and I get this error when I tried this. Below is my code

makeOrgPackageFromNCBI(versio="0.1", author="Manmohit Kalia <mkalia@binghamton.edu", maintainer="Manmohit Kalia <mkalia@binghamton.edu", outputDir=".",tax_id="208964 ",genus="Pseudomonas", species="aeruginosa")
If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
extracting data for our organism from : gene_info
getting data for gene2go.gz
rebuilding the cache
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in download.file(url, dest, quiet = TRUE) : 
  download from 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz' failed
In addition: Warning messages:
1: In download.file(url, dest, quiet = TRUE) :
  downloaded length 2893719956 != reported length 11377358957
2: In download.file(url, dest, quiet = TRUE) :
  URL 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz': Timeout of 1000 seconds was reached

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] clusterProfiler_4.4.4       shiny_1.7.3                 RSQLite_2.2.19             
 [4] ggplot2_3.4.0               DESeq2_1.36.0               SummarizedExperiment_1.26.1
 [7] MatrixGenerics_1.8.1        matrixStats_0.63.0          GenomicRanges_1.48.0       
[10] GenomeInfoDb_1.32.4         MeSHDbi_1.32.0              AnnotationHub_3.4.0        
[13] BiocFileCache_2.4.0         dbplyr_2.2.1                AnnotationForge_1.38.1     
[16] AnnotationDbi_1.58.0        IRanges_2.30.1              S4Vectors_0.34.0           
[19] edgeR_3.38.4                limma_3.52.4                NOISeq_2.40.0              
[22] Matrix_1.5-3                Biobase_2.56.0              BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
  [1] shadowtext_0.1.2              fastmatch_1.1-3              
  [3] plyr_1.8.8                    igraph_1.3.5                 
  [5] lazyeval_0.2.2                BiocParallel_1.30.4          
  [7] crosstalk_1.2.0               digest_0.6.30                
  [9] yulab.utils_0.0.5             htmltools_0.5.3              
 [11] GOSemSim_2.22.0               viridis_0.6.2                
 [13] GO.db_3.15.0                  fansi_1.0.3                  
 [15] magrittr_2.0.3                memoise_2.0.1                
 [17] Biostrings_2.64.1             annotate_1.74.0              
 [19] graphlayouts_0.8.4            enrichplot_1.16.2            
 [21] colorspace_2.0-3              blob_1.2.3                   
 [23] rappdirs_0.3.3                ggrepel_0.9.2                
 [25] xfun_0.35                     dplyr_1.0.10                 
 [27] crayon_1.5.2                  RCurl_1.98-1.9               
 [29] jsonlite_1.8.3                scatterpie_0.1.8             
 [31] genefilter_1.78.0             ape_5.6-2                    
 [33] survival_3.4-0                glue_1.6.2                   
 [35] polyclip_1.10-4               gtable_0.3.1                 
 [37] zlibbioc_1.42.0               XVector_0.36.0               
 [39] DelayedArray_0.22.0           scales_1.2.1                 
 [41] DOSE_3.22.1                   DBI_1.1.3                    
 [43] Rcpp_1.0.9                    viridisLite_0.4.1            
 [45] xtable_1.8-4                  tidytree_0.4.1               
 [47] gridGraphics_0.5-1            bit_4.0.5                    
 [49] DT_0.26                       htmlwidgets_1.5.4            
 [51] httr_1.4.4                    fgsea_1.22.0                 
 [53] RColorBrewer_1.1-3            ellipsis_0.3.2               
 [55] pkgconfig_2.0.3               XML_3.99-0.13                
 [57] farver_2.1.1                  sass_0.4.4                   
 [59] locfit_1.5-9.6                utf8_1.2.2                   
 [61] ggplotify_0.1.0               tidyselect_1.2.0             
 [63] rlang_1.0.6                   reshape2_1.4.4               
 [65] later_1.3.0                   munsell_0.5.0                
 [67] BiocVersion_3.15.2            tools_4.2.1                  
 [69] cachem_1.0.6                  downloader_0.4               
 [71] cli_3.4.1                     generics_0.1.3               
 [73] evaluate_0.18                 stringr_1.5.0                
 [75] fastmap_1.1.0                 yaml_2.3.6                   
 [77] ggtree_3.4.4                  knitr_1.41                   
 [79] bit64_4.0.5                   tidygraph_1.2.2              
 [81] purrr_0.3.5                   KEGGREST_1.36.3              
 [83] ggraph_2.1.0                  nlme_3.1-160                 
 [85] mime_0.12                     aplot_0.1.9                  
 [87] DO.db_2.9                     compiler_4.2.1               
 [89] rstudioapi_0.14               filelock_1.0.2               
 [91] curl_4.3.3                    png_0.1-8                    
 [93] interactiveDisplayBase_1.34.0 treeio_1.20.2                
 [95] tibble_3.1.8                  tweenr_2.0.2                 
 [97] geneplotter_1.74.0            bslib_0.4.1                  
 [99] stringi_1.7.8                 lattice_0.20-45              
[101] vctrs_0.5.1                   pillar_1.8.1                 
[103] lifecycle_1.0.3               BiocManager_1.30.19          
[105] jquerylib_0.1.4               data.table_1.14.6            
[107] bitops_1.0-7                  patchwork_1.1.2              
[109] httpuv_1.6.6                  qvalue_2.28.0                
[111] R6_2.5.1                      promises_1.2.0.1             
[113] gridExtra_2.3                 codetools_0.2-18             
[115] MASS_7.3-58.1                 assertthat_0.2.1             
[117] withr_2.5.0                   GenomeInfoDbData_1.2.8       
[119] parallel_4.2.1                ggfun_0.0.9                  
[121] grid_4.2.1                    tidyr_1.2.1                  
[123] rmarkdown_2.18                ggforce_0.4.1

Will you please take a look and suggest something

ADD REPLY • link updated 2.3 years ago by James W. MacDonald 68k • written 2.3 years ago by Manmohit ▴ 10

0

Entering edit mode

It's taking too long to download the alternative GO data. Before you run the function, do

options(timeout = 5000)

Also, when you post code, please put a triple-backtick (the upper left key on a QWERTY keyboard) in the line before and after your code.

ADD REPLY • link 2.3 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James . I will do that next time

ADD REPLY • link 2.3 years ago Manmohit ▴ 10

0

Entering edit mode

Hii James After I increased the timeout. It still failed. Here is the error that i get after running my code.

> options(timeout = 5000)
> makeOrgPackageFromNCBI(versio="0.1", author="Manmohit Kalia <mkalia@binghamton.edu", maintainer="Manmohit Kalia <mkalia@binghamton.edu", outputDir=".",tax_id="208964 ",genus="Pseudomonas", species="aeruginosa",NCBIFilesDir = ".",databaseOnly = FALSE,useDeprecatedStyle = FALSE,rebuildCache = TRUE,verbose = TRUE)
If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in download.file(url, dest, quiet = TRUE) : 
  download from 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz' failed
In addition: Warning messages:
1: In download.file(url, dest, quiet = TRUE) :
  downloaded length 5525519956 != reported length 11377358957
2: In download.file(url, dest, quiet = TRUE) :
  URL 'https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz': status was 'Failure when receiving data from the peer'

will you please check

ADD REPLY • link 2.3 years ago Manmohit ▴ 10

0

Entering edit mode

You could download the file from expasy.org directly, put it in your working directory, and use rebuildCache = FALSE in your call to makeOrgPackageFromNCBI. I would normally use either wget or curl for the download.

ADD REPLY • link 2.3 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James for suggestion. This worked

ADD REPLY • link 2.3 years ago Manmohit ▴ 10

0

Entering edit mode

WHen I am trying to install the package using BiocManager::Install it gives me error

BiocManager::install("./org.Paeruginosa.eg.db", character.only = TRUE)
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for
details

replacement repositories:
    CRAN: https://cran.rstudio.com/

Bioconductor version 3.15 (BiocManager 1.30.19), R 4.2.1 (2022-06-23)
Installing github package(s) './org.Paeruginosa.eg.db'
Error: package 'remotes' not installed in library path(s)
    /Library/Frameworks/R.framework/Versions/4.2/Resources/library
install with 'BiocManager::install("remotes")'

Am i doing anything wrong , I want to use it for use with clusterprofiler

ADD REPLY • link 2.3 years ago Manmohit ▴ 10

0

Entering edit mode

install.packages("./org.Paeruginosa.eg.db", repos = NULL)

ADD REPLY • link 2.3 years ago James W. MacDonald 68k

shepherl · Answer 2 · 2023-11-21

0

Entering edit mode

Aastha Kapoor • 0

@3e18707b

Last seen 6 months ago

India

Hi, I want to create a database for Klebsiella pneumoniae. I tried using the Annotation hub command, it it shows record empty.

    ah<-AnnotationHub()
query(ah, c("OrgDb", "Klebsiella"))

or

    library("AnnotationHub")
hub <- AnnotationHub()
query(hub,c("Klebsiella"))

AnnotationHub with 0 records
# snapshotDate(): 2023-10-20

I also tried through the makeOrgPackagefrom NCBI, that is also not working,

    makeOrgPackageFromNCBI(version = "0.1",
                       author = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       maintainer = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       outputDir = ".",
                       tax_id = "573",
                       genus = "Klebsiella",
                       species = "pneumoniae")

If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
Error in .tryDL(url, tmp) : url access failed after
4
attempts; url:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
In addition: Warning messages:
1: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  downloaded length 0 != reported length 0
2: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached
3: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  downloaded length 0 != reported length 0
4: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached
5: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  downloaded length 0 != reported length 0
6: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached
7: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  downloaded length 0 != reported length 0
8: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
  URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached

When downloaded all 5 files from NCBI (https://ftp.ncbi.nlm.nih.gov/gene/DATA/) in the working directory. it is still not working.

    makeOrgPackageFromNCBI(version = "0.1",
                       author = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       maintainer = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       outputDir = ".",
                       tax_id = "573",
                       genus = "Klebsiella",
                       species = "pneumoniae", 
                       rebuildCache=FALSE)
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
Error: no such table: main.gene2accession

Please suggest something.

Thanks Aastha

ADD COMMENT • link updated 16 months ago by shepherl 4.1k • written 16 months ago by Aastha Kapoor • 0

0

Entering edit mode

sessionInfo()
R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8 LC_MONETARY=English_India.utf8 [4] LC_NUMERIC=C LC_TIME=English_India.utf8

time zone: Asia/Calcutta tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] AnnotationHub_3.10.0 BiocFileCache_2.10.1 dbplyr_2.4.0 AnnotationForge_1.44.0 [5] AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.1 Biobase_2.62.0
[9] BiocGenerics_0.48.1 clusterProfiler_4.10.0

loaded via [1] RColorBrewer_1.1-3 [4] magrittr_2.0.3 [7] zlibbioc_1.48.0 [10] RCurl_1.98-1.13 [13] usethis_2.2.2 [16] htmlwidgets_1.6.2 [19] igraph_1.5.1 [22] pkgconfig_2.0.3 [25] R6_2.5.1 [28] shiny_1.8.0 [31] enrichplot_1.22.0 [34] ps_1.7.5 [37] filelock_1.0.2 [40] polyclip_1.10-6 [43] bit64_4.0.5 [46] viridis_0.6.4 [49] ggforce_0.4.1 [52] sessioninfo_1.2.2 [55] interactiveDis [58] httpuv_1.6.12 [61] nlme_3.1-163 [64] shadowtext_0.1.2 [67] fgsea_1.28.0 [70] tidyr_1.3.0 [73] utf8_1.2.4 [76] ggrepel_0.9.4 [79] yulab.utils_0.1.0 [82] dplyr_1.1.3 [85] lattice_0.21-9 [88] GO.db_3.18.0 [91] gridExtra_2.3 [94] stringi_1.8.1 [97] ggfun_0.1.3 [100] tibble_3.2.1 [103] ggplotify_0.1.2 [106] munsell_0.5.0 [109] GenomeInfoDb_1.38.1 [112] parallel_4.3.2 [115] blob_1.2.4 [118] DOSE_3.28.1 [121] viridisLite_0.4.2 [124] purrr_1.0.2 [127] cowplot_1.1.1 a namespace (and not attached): jsonlite_1.8.7 rstudioapi_0.15.0
farver_2.1.1 fs_1.6.3
vctrs_0.6.4 memoise_2.0.1
ggtree_3.10.0 htmltools_0.5.7
curl_5.1.0 gridGraphics_0.5-1
plyr_1.8.9 cachem_1.0.8
mime_0.12 lifecycle_1.0.4
gson_0.1.0 Matrix_1.6-1.1
fastmap_1.1.1 GenomeInfoDbData_1.2.11
digest_0.6.33 aplot_0.2.2
colorspace_2.1-0 patchwork_1.1.3
pkgload_1.3.3 RSQLite_2.3.3
fansi_1.0.5 httr_1.4.7
compiler_4.3.2 remotes_2.4.2.1
withr_2.5.2 BiocParallel_1.36.0
DBI_1.1.3 pkgbuild_1.4.2
MASS_7.3-60 rappdirs_0.3.3
HDO.db_0.99.1 tools_4.3.2
playBase_1.40.0 scatterpie_0.2.1 ape_5.7-1
glue_1.6.2 callr_3.7.3
GOSemSim_2.28.0 promises_1.2.1
grid_4.3.2 reshape2_1.4.4
generics_0.1.3 gtable_0.3.4
data.table_1.14.8 tidygraph_1.2.3
XVector_0.42.0 BiocVersion_3.18.1
pillar_1.9.0 stringr_1.5.1
later_1.3.1 splines_4.3.2
tweenr_2.0.2 treeio_1.26.0
bit_4.0.5 tidyselect_1.2.0
Biostrings_2.70.1 miniUI_0.1.1.1
graphlayouts_1.0.2 devtools_2.4.5
yaml_2.3.7 lazyeval_0.2.2
codetools_0.2-19 ggraph_2.1.0
qvalue_2.34.0 BiocManager_1.30.22
cli_3.6.1 xtable_1.8-4
processx_3.8.2 Rcpp_1.0.11
png_0.1-8 XML_3.99-0.15
ellipsis_0.3.2 ggplot2_3.4.4
prettyunits_1.2.0 profvis_0.3.8
urlchecker_1.0.1 bitops_1.0-7
tidytree_0.4.5 scales_1.2.1
crayon_1.5.2 rlang_1.1.1
fastmatch_1.1-4 KEGGREST_1.42.0

ADD REPLY • link 16 months ago Aastha Kapoor • 0

0

Entering edit mode

Please don't add on to old posts, and if you do so, please don't do so by adding an answer to that post! Ideally you should start a new thread.

The problem is that there is no data for that species. It looks like you have generated the NCBI.sqlite database, which you can connect to and query.

> library(RSQLite)
> con <- dbConnect(SQLite(), "NCBI.sqlite")
> dbListTables(con)
 [1] "altGO"              
 [2] "altGO_date"         
 [3] "gene2accession"     
 [4] "gene2accession_date"
 [5] "gene2go"            
 [6] "gene2go_date"       
 [7] "gene2pubmed"        
 [8] "gene2pubmed_date"   
 [9] "gene2refseq"        
[10] "gene2refseq_date"   
[11] "gene_info"          
[12] "gene_info_date"     

> dbGetQuery(con, "select * from gene2accession where tax_id='583' limit 5;")
 [1] tax_id               
 [2] gene_id              
 [3] status               
 [4] rna_accession        
 [5] rna_gi               
 [6] protein_accession    
 [7] protein_gi           
 [8] genomic_dna_accession
 [9] genomic_dna_gi       
[10] genomic_start        
[11] genomic_end          
[12] orientation          
[13] assembly             
[14] peptide_accession    
[15] peptide_gi           
[16] symbol               
<0 rows> (or 0-length row.names)

Unfortunately bacteria are not well annotated at NCBI, and I don't see anything for this species at Ensembl either.

ADD REPLY • link 16 months ago James W. MacDonald 68k

0

Entering edit mode

Oh wait, my bad. Wrong tax_id.

> dbGetQuery(con, "select * from gene2accession where tax_id='573' limit 5;")
  tax_id  gene_id status
1    573 39626215      -
2    573 39626215      -
3    573 39626215      -
4    573 39626215      -
5    573 39626215      -
  rna_accession rna_gi
1             -      -
2             -      -
3             -      -
4             -      -
5             -      -
  protein_accession protein_gi
1        AUN70723.1 1325153910
2        AUN86180.1 1325171295
3        AVB75012.1 1342458681
4        AWG77226.1 1385671902
5        AWJ18664.1 1388214698
  genomic_dna_accession
1            CP025632.1
2            CP025634.1
3            CP026587.1
4            CP028916.1
5            CP029221.1
  genomic_dna_gi genomic_start
1     1325153821             -
2     1325171200             -
3     1342458566             -
4     1385671893             -
5     1388214531             -
  genomic_end orientation assembly
1           -           ?        -
2           -           ?        -
3           -           ?        -
4           -           ?        -
5           -           ?        -
  peptide_accession peptide_gi
1                 -          -
2                 -          -
3                 -          -
4                 -          -
5                 -          -
         symbol
1 ENZ43_RS00015
2 ENZ43_RS00015
3 ENZ43_RS00015
4 ENZ43_RS00015
5 ENZ43_RS00015
> 

## And
> makeOrgPackageFromNCBI(version = "0.1",
                       author = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       maintainer = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       outputDir = ".",
                       tax_id = "573",
                       genus = "Klebsiella",
                       species = "pneumoniae", 
                       rebuildCache=FALSE)
. + > preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data

So it looks like it's going to work for me.

You might check that the gene2accession.gz file is OK. It's a large file and might not have all been downloaded.

md5sum gene2accession.gz
348dbba2b1a9eaaba6a7e7f67082cb81 *gene2accession.gz

If you don't have md5sum installed, you can get it by installing RTools (which is useful to have on Windows anyway).

ADD REPLY • link 16 months ago James W. MacDonald 68k

0

Entering edit mode

Thanks for sharing.

I followed the steps but it is not showing as per your commands.

BiocManager::install("clusterProfiler")

library("clusterProfiler")

BiocManager::install("AnnotationForge") library("AnnotationForge")

setwd()

install.packages("RSQLite") library("RSQLite") library("DBI") con <- dbConnect(SQLite(), "NCBI.sqlite") dbListTables(con)

[1] "gene2pubmed" "gene2pubmed_date" "gene_info"

I have downloaded the following gene2pubmed, gene2accession, gene_info, gene2refseq and gene2go in the working directory.

When I ran the next command, it can't locate the 3.1 GB file gene2accession file.

dbGetQuery(con, "select * from gene2accession where tax_id='573' limit 5;") Error: no such table: gene2accession

How shall i go ahead

ADD REPLY • link 16 months ago Aastha Kapoor • 0

0

Entering edit mode

I would delete the NCBI.sqlite that was created as it seems to be corrupt. and run your makeOrgPackageFromNCBI again with rebuildCache=TRUE after setting a higher timeout limit so the files can be downloaded options(timeout=10000). But James might have a better idea.

ADD REPLY • link 16 months ago shepherl 4.1k

0

Entering edit mode

try increasing your timeout limit with something like options(timeout=10000). I believe the code will not only download but extract certain data and by trying to manually download it would skip those steps.

ADD REPLY • link 16 months ago shepherl 4.1k

0

Entering edit mode

I also tried to rerun the above command with timeout=10000, still it got paused after downloaded few files.

    options(timeout=10000)
makeOrgPackageFromNCBI(version = "0.1",
                       author = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       maintainer = "Aastha kapoor <aasthakapoor95@gmail.com>",
                       outputDir = ".",
                       tax_id = "573",
                       genus = "Klebsiella",
                       species = "pneumoniae")

If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
Error: no such table: gene_info_date
In addition: Warning message:
call dbDisconnect() when finished working with a connection

Then I tried to run the above command to connect to SQLite. that also showed error.

    con <- dbConnect(RSQLite::SQLite(), "NCBI.sqlite")
dbListTables(con)
dbGetQuery(con, "select * from gene2accession where tax_id='573' limit 5;")

    library("RSQLite")
> library("DBI")
> con <- dbConnect(RSQLite::SQLite(), "NCBI.sqlite")
> dbListTables(con)
[1] "gene2accession"      "gene2accession_date" "gene2pubmed"         "gene2pubmed_date"   
[5] "gene2refseq"         "gene2refseq_date"    "gene_info"          
> dbGetQuery(con, "select * from gene2accession where tax_id='573' limit 5;")
  tax_id  gene_id status rna_accession rna_gi protein_accession protein_gi genomic_dna_accession
1    573 39626215      -             -      -        AUN70723.1 1325153910            CP025632.1
2    573 39626215      -             -      -        AUN86180.1 1325171295            CP025634.1
3    573 39626215      -             -      -        AVB75012.1 1342458681            CP026587.1
4    573 39626215      -             -      -        AWG77226.1 1385671902            CP028916.1
5    573 39626215      -             -      -        AWJ18664.1 1388214698            CP029221.1
  genomic_dna_gi genomic_start genomic_end orientation assembly peptide_accession peptide_gi
1     1325153821             -           -           ?        -                 -          -
2     1325171200             -           -           ?        -                 -          -
3     1342458566             -           -           ?        -                 -          -
4     1385671893             -           -           ?        -                 -          -
5     1388214531             -           -           ?        -                 -          -
         symbol
1 ENZ43_RS00015
2 ENZ43_RS00015
3 ENZ43_RS00015
4 ENZ43_RS00015
5 ENZ43_RS00015
> md5sum gene2accession.gz
Error: unexpected symbol in "md5sum gene2accession.gz"
> makeOrgPackageFromNCBI(version = "0.1",
+                        author = "Aastha kapoor <aasthakapoor95@gmail.com>",
+                        maintainer = "Aastha kapoor <aasthakapoor95@gmail.com>",
+                        outputDir = ".",
+                        tax_id = "573",
+                        genus = "Klebsiella",
+                        species = "pneumoniae", 
+                        rebuildCache=FALSE)
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
Error: no such table: main.gene2go

Shall I increase the timeout and then rerun with rebuildCache=TRUE?

ADD REPLY • link updated 16 months ago by James W. MacDonald 68k • written 16 months ago by Aastha Kapoor • 0

0

Entering edit mode

Might be easier to go here, and download all the files that start with gene2, as well as gene_info. Then delete the NCBI.SQLite file and rerun using rebuildCache=FALSE. It looks like you are not downloading the files completely, so getting them directly might be easier. Or you could set timeout to some huge number. Your call.

ADD REPLY • link 16 months ago James W. MacDonald 68k

0

Entering edit mode

I tried running with increasing the timeout options(timeout=1000000)

makeOrgPackageFromNCBI(version = "0.1",

author = "Aastha kapoor aasthakapoor95@gmail.com",

maintainer = "Aastha kapoor aasthakapoor95@gmail.com",

outputDir = ".",

tax_id = "573",

genus = "Klebsiella",

species = "pneumoniae") If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz rebuilding the cache Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : error reading from the connection In addition: Warning messages: 1: call dbDisconnect() when finished working with a connection 2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : invalid or incomplete compressed data

What shall I do?

ADD REPLY • link 16 months ago Aastha Kapoor • 0

0

Entering edit mode

Reread what I said to do, and then do that.

ADD REPLY • link 16 months ago James W. MacDonald 68k

0

Entering edit mode

I downloaded all the suggested files and then ran the command with rebuildCashe=FALSE (had also deleted the NCBI.sqlite file before starting the command).But it still gave me error.

setwd("E:/database")

makeOrgPackageFromNCBI(version = "0.1",

author = "Aastha kapoor aasthakapoor95@gmail.com",

maintainer = "Aastha kapoor aasthakapoor95@gmail.com",

outputDir = ".",

tax_id = "573",

genus = "Klebsiella",

species = "pneumoniae", rebuildCache = FALSE) preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz Error: no such table: main.gene2pubmed

I thought it would be a download issue, so I re-ran the command with increasing the timeout.

options(timeout=1000000)

makeOrgPackageFromNCBI(version = "0.1",

author = "Aastha kapoor aasthakapoor95@gmail.com",

maintainer = "Aastha kapoor aasthakapoor95@gmail.com",

outputDir = ".",

tax_id = "573",

genus = "Klebsiella",

species = "pneumoniae") If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz rebuilding the cache extracting data for our organism from : gene2pubmed getting data for gene2accession.gz rebuilding the cache Error: disk I/O error In addition: Warning messages: 1: call dbDisconnect() when finished working with a connection 2: In download.file(url, tmp, quiet = TRUE, mode = "wb") : downloaded length 0 != reported length 0 3: In download.file(url, tmp, quiet = TRUE, mode = "wb") : URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': status was 'Failed writing received data to disk/application' 4: In download.file(url, tmp, quiet = TRUE, mode = "wb") : downloaded length 0 != reported length 0 5: In download.file(url, tmp, quiet = TRUE, mode = "wb") : URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': status was 'Transferred a partial file' Error: no such savepoint: dbWriteTable_24320_mlthoemfps In addition: Warning message: Closing open result set, pending rows

It still did not work. I really don't know what the issue is. All the required files are presently in a hard disk (connected to a system), can that be a issue.

sessionInfo()
R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8 LC_MONETARY=English_India.utf8 [4] LC_NUMERIC=C LC_TIME=English_India.utf8

time zone: Asia/Calcutta tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] AnnotationForge_1.44.0 AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.1
[5] Biobase_2.62.0 BiocGenerics_0.48.1 clusterProfiler_4.10.0

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 jsonlite_1.8.7 rstudioapi_0.15.0
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] zlibbioc_1.48.0 vctrs_0.6.4 memoise_2.0.1
[10] RCurl_1.98-1.13 ggtree_3.10.0 htmltools_0.5.7
[13] usethis_2.2.2 curl_5.1.0 AnnotationHub_3.10.0
[16] gridGraphics_0.5-1 htmlwidgets_1.6.3 plyr_1.8.9
[19] cachem_1.0.8 igraph_1.5.1 mime_0.12
[22] lifecycle_1.0.4 pkgconfig_2.0.3 gson_0.1.0
[25] Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1
[28] GenomeInfoDbData_1.2.11 shiny_1.8.0 digest_0.6.33
[31] aplot_0.2.2 enrichplot_1.22.0 colorspace_2.1-0
[34] patchwork_1.1.3 ps_1.7.5 pkgload_1.3.3
[37] RSQLite_2.3.3 filelock_1.0.2 fansi_1.0.5
[40] httr_1.4.7 polyclip_1.10-6 compiler_4.3.2
[43] remotes_2.4.2.1 bit64_4.0.5 withr_2.5.2
[46] BiocParallel_1.36.0 viridis_0.6.4 DBI_1.1.3
[49] pkgbuild_1.4.2 ggforce_0.4.1 MASS_7.3-60
[52] rappdirs_0.3.3 sessioninfo_1.2.2 HDO.db_0.99.1
[55] tools_4.3.2 interactiveDisplayBase_1.40.0 scatterpie_0.2.1
[58] ape_5.7-1 httpuv_1.6.12 glue_1.6.2
[61] callr_3.7.3 nlme_3.1-163 GOSemSim_2.28.0
[64] promises_1.2.1 shadowtext_0.1.2 grid_4.3.2
[67] reshape2_1.4.4 fgsea_1.28.0 generics_0.1.3
[70] gtable_0.3.4 tidyr_1.3.0 data.table_1.14.8
[73] tidygraph_1.2.3 utf8_1.2.4 XVector_0.42.0
[76] BiocVersion_3.18.1 ggrepel_0.9.4 pillar_1.9.0
[79] stringr_1.5.1 yulab.utils_0.1.0 later_1.3.1
[82] splines_4.3.2 dplyr_1.1.4 tweenr_2.0.2
[85] BiocFileCache_2.10.1 treeio_1.26.0 lattice_0.21-9
[88] bit_4.0.5 tidyselect_1.2.0 GO.db_3.18.0
[91] Biostrings_2.70.1 miniUI_0.1.1.1 gridExtra_2.3
[94] graphlayouts_1.0.2 devtools_2.4.5 stringi_1.8.1
[97] yaml_2.3.7 lazyeval_0.2.2 ggfun_0.1.3
[100] codetools_0.2-19 ggraph_2.1.0 tibble_3.2.1
[103] qvalue_2.34.0 BiocManager_1.30.22 ggplotify_0.1.2
[106] cli_3.6.1 xtable_1.8-4 munsell_0.5.0
[109] processx_3.8.2 Rcpp_1.0.11 GenomeInfoDb_1.38.1
[112] dbplyr_2.4.0 png_0.1-8 XML_3.99-0.15
[115] parallel_4.3.2 ellipsis_0.3.2 ggplot2_3.4.4
[118] blob_1.2.4 prettyunits_1.2.0 profvis_0.3.8
[121] DOSE_3.28.1 urlchecker_1.0.1 bitops_1.0-7
[124] viridisLite_0.4.2 tidytree_0.4.5 scales_1.2.1
[127] purrr_1.0.2 crayon_1.5.2 rlang_1.1.1
[130] cowplot_1.1.1 fastmatch_1.1-4 KEGGREST_1.42.0

Please help me solve this issue.

ADD REPLY • link 16 months ago Aastha Kapoor • 0

0

Entering edit mode

One thing you will have to learn if you plan to use R for any length of time is that the help people give you (help pages, online support sites, etc) can be very terse, but they almost always tell you exactly what you need to do

So let me reiterate what I have told you to do, and now I will be as completely descriptive as possible.

1.) DELETE the NCBI.sqlite database

2.) Download the files directly from NCBI

3.) Rerun your code with rebuildCache=FALSE

If you re-read what I already told you to do, two answers ago, that is exactly what I said to do. Nothing about changing the timeout options or re-running without specifying rebuildCache = FALSE.

I know you haven't deleted the NCBI.sqlite file because the error you get (Error: no such table: main.gene2pubmed) is what happens when you have a partially built NCBI.sqlite file and the function is using that instead of generating another one. Which is why I said to delete it. If you delete the database and download the files by hand, and use rebuildCache = FALSE, then the function will not try to download the files again and instead will use the files you already have (in your working directory) to rebuild the NCBI.sqlite file, and then will parse the information you need out of that database.

Also, if your E drive is not on the actual computer you are using (e.g., if it's a network drive or a OneDrive location or something similar), then don't use that as your working directory. You need to be on a regular drive on your computer for this to work.

ADD REPLY • link 16 months ago James W. MacDonald 68k

0

Entering edit mode

Sorry for asking the same questions again and again. I still feel there is some other issue. either the files are not downloading properly or a disk space issue. I tried to resolve both these issues.

I repeated these steps twice as you suggested but still getting the same issue.

I deleted the NCBI.sqlite file
I re-downloaded all the files (gene2refseq.gz, gene2accession.gz, gene2pubmed.gz, gene2go.gz, gene2ensembl.gz and gene_info.gz) from NCBI (https://ftp.ncbi.nlm.nih.gov/gene/DATA/) using wget.
then reopen R, loading the packages and set the working directory and run the command with rebuildCache= FALSE.

setwd("F:/aastha/phd_lab/iron_transcriptome/transcriptome_analysis/database_clusterprofiler")

setwd()

makeOrgPackageFromNCBI(version = "0.1",

author = "Aastha kapoor aasthakapoor95@gmail.com",
maintainer = "Aastha kapoor aasthakapoor95@gmail.com",
outputDir = ".",
tax_id = "573",
genus = "Klebsiella",
species = "pneumoniae", rebuildCache = FALSE) preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz Error: no such table: main.gene2pubmed

These commands were run within the computer F:drive and it has a free space of 267GB.

sessionInfo()
R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8 LC_MONETARY=English_India.utf8 [4] LC_NUMERIC=C LC_TIME=English_India.utf8

time zone: Asia/Calcutta tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] DBI_1.1.3 RSQLite_2.3.3 AnnotationForge_1.44.0 AnnotationDbi_1.64.1
[5] IRanges_2.36.0 S4Vectors_0.40.1 Biobase_2.62.0 BiocGenerics_0.48.1
[9] clusterProfiler_4.10.0

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 jsonlite_1.8.7 rstudioapi_0.15.0
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] zlibbioc_1.48.0 vctrs_0.6.4 memoise_2.0.1
[10] RCurl_1.98-1.13 ggtree_3.10.0 htmltools_0.5.7
[13] usethis_2.2.2 curl_5.1.0 AnnotationHub_3.10.0
[16] gridGraphics_0.5-1 htmlwidgets_1.6.3 plyr_1.8.9
[19] cachem_1.0.8 igraph_1.5.1 mime_0.12
[22] lifecycle_1.0.4 pkgconfig_2.0.3 gson_0.1.0
[25] Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1
[28] GenomeInfoDbData_1.2.11 shiny_1.8.0 digest_0.6.33
[31] aplot_0.2.2 enrichplot_1.22.0 colorspace_2.1-0
[34] patchwork_1.1.3 ps_1.7.5 pkgload_1.3.3
[37] filelock_1.0.2 fansi_1.0.5 httr_1.4.7
[40] polyclip_1.10-6 compiler_4.3.2 remotes_2.4.2.1
[43] bit64_4.0.5 withr_2.5.2 BiocParallel_1.36.0
[46] viridis_0.6.4 pkgbuild_1.4.2 ggforce_0.4.1
[49] MASS_7.3-60 rappdirs_0.3.3 sessioninfo_1.2.2
[52] HDO.db_0.99.1 tools_4.3.2 interactiveDisplayBase_1.40.0 [55] scatterpie_0.2.1 ape_5.7-1 httpuv_1.6.12
[58] glue_1.6.2 callr_3.7.3 nlme_3.1-163
[61] GOSemSim_2.28.0 promises_1.2.1 shadowtext_0.1.2
[64] grid_4.3.2 reshape2_1.4.4 fgsea_1.28.0
[67] generics_0.1.3 gtable_0.3.4 tidyr_1.3.0
[70] data.table_1.14.8 tidygraph_1.2.3 utf8_1.2.4
[73] XVector_0.42.0 BiocVersion_3.18.1 ggrepel_0.9.4
[76] pillar_1.9.0 stringr_1.5.1 yulab.utils_0.1.0
[79] later_1.3.1 splines_4.3.2 dplyr_1.1.4
[82] tweenr_2.0.2 BiocFileCache_2.10.1 treeio_1.26.0
[85] lattice_0.21-9 bit_4.0.5 tidyselect_1.2.0
[88] GO.db_3.18.0 Biostrings_2.70.1 miniUI_0.1.1.1
[91] gridExtra_2.3 graphlayouts_1.0.2 devtools_2.4.5
[94] stringi_1.8.1 yaml_2.3.7 lazyeval_0.2.2
[97] ggfun_0.1.3 codetools_0.2-19 ggraph_2.1.0
[100] tibble_3.2.1 qvalue_2.34.0 BiocManager_1.30.22
[103] ggplotify_0.1.2 cli_3.6.1 xtable_1.8-4
[106] munsell_0.5.0 processx_3.8.2 Rcpp_1.0.11
[109] GenomeInfoDb_1.38.1 dbplyr_2.4.0 png_0.1-8
[112] XML_3.99-0.15 parallel_4.3.2 ellipsis_0.3.2
[115] ggplot2_3.4.4 blob_1.2.4 prettyunits_1.2.0
[118] profvis_0.3.8 DOSE_3.28.1 urlchecker_1.0.1
[121] bitops_1.0-7 viridisLite_0.4.2 tidytree_0.4.5
[124] scales_1.2.1 purrr_1.0.2 crayon_1.5.2
[127] rlang_1.1.1 cowplot_1.1.1 fastmatch_1.1-4
[130] KEGGREST_1.42.0

I having been troubling your daily. Please help me solve the issue.

ADD REPLY • link 16 months ago Aastha Kapoor • 0

1

Entering edit mode

That drive looks like it's a network drive, rather than a drive on your computer. You shouldn't use a network drive!

ADD REPLY • link 16 months ago James W. MacDonald 68k

0

Entering edit mode

After several attempts the code has worked. I connected to a faster internet connection. set the timeout (1000000) and run the command. I had downloaded all gene2 files and gene info files from NCBI before running the command. I gave an error of idmapping_selected.tab.gz file could not download. Then I downloaded the file (idmapping_selected.tab.gz) using wget command. rerun the command with rebuildCache=FALSE. Finally the database has been created.

How do I load the package. But I tried but it is giving me error.

install.packages(./org.Kpneumoniae.eg.db , repos = NULL)

Installing package into C:/Users/Aastha/AppData/Local/R/win-library/4.3 (as lib is unspecified)

Error in install.packages : type == both cannot be used with repos = NULL

ADD REPLY • link 16 months ago Aastha Kapoor • 0

0

Entering edit mode

This is something you should be able to work out on your own. When you get an error, you have to read it and try to figure out what it means. In this case by reading the help page:

Usage:

     install.packages(pkgs, lib, repos = getOption("repos"),
                      contriburl = contrib.url(repos, type),
                      method, available = NULL, destdir = NULL,
                      dependencies = NA, type = getOption("pkgType"),
                      configure.args = getOption("configure.args"),
                      configure.vars = getOption("configure.vars"),
                      clean = FALSE, Ncpus = getOption("Ncpus", 1L),
                      verbose = getOption("verbose"),
                      libs_only = FALSE, INSTALL_opts, quiet = FALSE,
                      keep_outputs = FALSE, ...)

And you can check what the 'type' argument is, by default.

> getOption("pkgType")
[1] "both"

And, under the Arguments section of the help page

 on Windows, file paths of '.zip' files containing binary
              builds of packages.  ('http://' and 'file://' URLs are
              also accepted and the files will be downloaded and
              installed from local copies.)  Source directories or file
              paths or URLs of archives may be specified with 'type =
              "source"', but some packages need suitable tools
              installed (see the 'Details' section).

Which leads to

install.packages(./org.Kpneumoniae.eg.db , repos = NULL, type = "source")

Which will work.

ADD REPLY • link 16 months ago James W. MacDonald 68k