Error for AnnotationForge makeOrgPackageFromNCBI function
Entering edit mode
Gayatri • 0
Last seen 22 days ago

Hello everyone,

I am PhD student working on performing GSEA analysis for Candida albicans data. I queried AnnotationHub for existing records and found none. Hence I am trying to make an Organism DB for C. albicans. After going through the threads of Problem making orgdb package for bacteria (Pseudomonas) using annotation hub and annotation forge; error with downloading NCBI data for makeorgpackagefromNCBI? ; AnnotationForge not working for building custom org packages; I am still encountering the following errors. I have tried downloading the files directly from after deleting the NCBI.sqlite file, but to no avail. I even tried changing the timeout settings to 10000. Any help in this regard is highly appreciated.

> hub <- AnnotationHub()
  |=====================================================================================| 100%

snapshotDate(): 2023-10-23
> query(hub, c("OrgDb","Candida albicans"))
AnnotationHub with 0 records
# snapshotDate(): 2023-10-23

> getOption('timeout')
[1] 60
> options(timeout = 10000)
> getOption('timeout')
[1] 10000

> makeOrgPackageFromNCBI("0.1", 
+                        "Gayatri <>", 
+                        "Gayatri", 
+                        ".", 
+                        "237561", 
+                        "Candida", 
+                        "albicans", 
+                        rebuildCache = FALSE)
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
Error: no such table: main.gene2pubmed

> list.files()
[1] "gene_info.gz"      "gene2accession.gz" "gene2go.gz"        "gene2pubmed.gz"   
[5] "gene2refseq.gz" 

> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Asia/Calcutta
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AnnotationForge_1.44.0 biomaRt_2.58.2         AnnotationHub_3.10.1  
 [4] BiocFileCache_2.10.2   dbplyr_2.5.0           GenomeInfoDb_1.38.8   
 [7] AnnotationDbi_1.64.1   IRanges_2.36.0         S4Vectors_0.40.2      
[10] Biobase_2.62.0         BiocGenerics_0.48.1   

loaded via a namespace (and not attached):
 [1] KEGGREST_1.42.0               vctrs_0.6.5                   tools_4.3.3                  
 [4] bitops_1.0-7                  generics_0.1.3                curl_5.2.1                   
 [7] tibble_3.2.1                  fansi_1.0.6                   RSQLite_2.3.6                
[10] blob_1.2.4                    pkgconfig_2.0.3               lifecycle_1.0.4              
[13] GenomeInfoDbData_1.2.11       compiler_4.3.3                stringr_1.5.1                
[16] Biostrings_2.70.3             progress_1.2.3                httpuv_1.6.15                
[19] htmltools_0.5.8.1             RCurl_1.98-1.14               yaml_2.3.8                   
[22] interactiveDisplayBase_1.40.0 pillar_1.9.0                  later_1.3.2                  
[25] crayon_1.5.2                  cachem_1.0.8                  mime_0.12                    
[28] tidyselect_1.2.1              digest_0.6.35                 stringi_1.8.3                
[31] dplyr_1.1.4                   BiocVersion_3.18.1            fastmap_1.1.1                
[34] cli_3.6.2                     magrittr_2.0.3                XML_3.99-0.16.1              
[37] utf8_1.2.4                    prettyunits_1.2.0             filelock_1.0.3               
[40] promises_1.3.0                rappdirs_0.3.3                bit64_4.0.5                  
[43] XVector_0.42.0                httr_1.4.7                    bit_4.0.5                    
[46] png_0.1-8                     hms_1.1.3                     memoise_2.0.1                
[49] shiny_1.8.1.1                 rlang_1.1.3                   Rcpp_1.0.12                  
[52] xtable_1.8-4                  glue_1.7.0                    DBI_1.2.2                    
[55] xml2_1.3.6                    BiocManager_1.30.22           R6_2.5.1                     
[58] zlibbioc_1.48.2              
Warning message:
call dbDisconnect() when finished working with a connection
OrgDb AnnotationForge • 415 views
Entering edit mode
Last seen 15 hours ago
United States

That error incidates that your NBCI.sqlite database is missing the gene2pubmed table, so you should regenerate that db. Just delete it and then rerun the script as you already have.

Entering edit mode

Hello James,

How to generate the db? By running the makeOrgPackageFromNCBI() command? I have tried doing that, first by just deleting the NCBI.sqlite file and running the command; and then deleting both the NCBI.sqlite as well as gene2pubmed file as the size of the file is around 180 Mb. But then I am still getting the error that gene2accession file is partially transferred as that too is re-downloaded, even though it is already downloaded prior to running the command.

Entering edit mode

I don't really follow what you are saying. All you have to do is delete the NCBI.sqlite db and re-run the script exactly as you did above. If you say rebuildCache = FALSE you shouldn't download anything. And the error you got before didn't say anything about downloading files. It said that you were missing the gene2pubmed table.

Entering edit mode

What I meant is even after deleting the NCBI.sqlite file, and re-running the script, an empty NCBI.sqlite file (0 kb) is created which is causing the error I mentioned:

preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz Error: no such table: main.gene2pubmed

I ensured any partially created NCBI.sqlite files are deleted, then downloaded the data directly from the NCBI, and then only re-ran the script. But the creation of this empty NCBI. sqlite file is causing the script to terminate. What do you suggest I do regarding this?

Entering edit mode

That's weird. I don't have any problem at all generating the OrgDb on my box. How big is the gene2pubmed.gz file? I get this:

gzip -dc gene2pubmed.gz | wc -l

So just over 57M rows. I get fewer from the NCBI.sqlite file, but it's definitely there.

> library(RSQLite)
Warning message:
package 'RSQLite' was built under R version 4.3.2 
> con <- dbConnect(SQLite(), "NCBI.sqlite")
> dbGetQuery(con, "select count(*) from gene2pubmed;") 
1  4843195
Entering edit mode

Hello James,

Thank you so much for suggesting ways to solve my query. After many trials, and waiting for good internet speed, I got the command to work and now successfully have my organism package built.


Gayatri Brahmandam.


Login before adding your answer.

Traffic: 417 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6