Hello everyone,
I am PhD student working on performing GSEA analysis for Candida albicans data. I queried AnnotationHub for existing records and found none. Hence I am trying to make an Organism DB for C. albicans. After going through the threads of Problem making orgdb package for bacteria (Pseudomonas) using annotation hub and annotation forge; error with downloading NCBI data for makeorgpackagefromNCBI? ; AnnotationForge not working for building custom org packages; I am still encountering the following errors. I have tried downloading the files directly from https://ftp.ncbi.nlm.nih.gov/gene/DATA/ after deleting the NCBI.sqlite file, but to no avail. I even tried changing the timeout settings to 10000. Any help in this regard is highly appreciated.
> hub <- AnnotationHub()
|=====================================================================================| 100%
snapshotDate(): 2023-10-23
> query(hub, c("OrgDb","Candida albicans"))
AnnotationHub with 0 records
# snapshotDate(): 2023-10-23
> getOption('timeout')
[1] 60
> options(timeout = 10000)
> getOption('timeout')
[1] 10000
> makeOrgPackageFromNCBI("0.1",
+ "Gayatri <gayatri@catg.edu.in>",
+ "Gayatri",
+ ".",
+ "237561",
+ "Candida",
+ "albicans",
+ rebuildCache = FALSE)
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
Error: no such table: main.gene2pubmed
> list.files()
[1] "gene_info.gz" "gene2accession.gz" "gene2go.gz" "gene2pubmed.gz"
[5] "gene2refseq.gz"
> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Asia/Calcutta
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationForge_1.44.0 biomaRt_2.58.2 AnnotationHub_3.10.1
[4] BiocFileCache_2.10.2 dbplyr_2.5.0 GenomeInfoDb_1.38.8
[7] AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.2
[10] Biobase_2.62.0 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] KEGGREST_1.42.0 vctrs_0.6.5 tools_4.3.3
[4] bitops_1.0-7 generics_0.1.3 curl_5.2.1
[7] tibble_3.2.1 fansi_1.0.6 RSQLite_2.3.6
[10] blob_1.2.4 pkgconfig_2.0.3 lifecycle_1.0.4
[13] GenomeInfoDbData_1.2.11 compiler_4.3.3 stringr_1.5.1
[16] Biostrings_2.70.3 progress_1.2.3 httpuv_1.6.15
[19] htmltools_0.5.8.1 RCurl_1.98-1.14 yaml_2.3.8
[22] interactiveDisplayBase_1.40.0 pillar_1.9.0 later_1.3.2
[25] crayon_1.5.2 cachem_1.0.8 mime_0.12
[28] tidyselect_1.2.1 digest_0.6.35 stringi_1.8.3
[31] dplyr_1.1.4 BiocVersion_3.18.1 fastmap_1.1.1
[34] cli_3.6.2 magrittr_2.0.3 XML_3.99-0.16.1
[37] utf8_1.2.4 prettyunits_1.2.0 filelock_1.0.3
[40] promises_1.3.0 rappdirs_0.3.3 bit64_4.0.5
[43] XVector_0.42.0 httr_1.4.7 bit_4.0.5
[46] png_0.1-8 hms_1.1.3 memoise_2.0.1
[49] shiny_1.8.1.1 rlang_1.1.3 Rcpp_1.0.12
[52] xtable_1.8-4 glue_1.7.0 DBI_1.2.2
[55] xml2_1.3.6 BiocManager_1.30.22 R6_2.5.1
[58] zlibbioc_1.48.2
Warning message:
call dbDisconnect() when finished working with a connection
Hello James,
How to generate the db? By running the makeOrgPackageFromNCBI() command? I have tried doing that, first by just deleting the NCBI.sqlite file and running the command; and then deleting both the NCBI.sqlite as well as gene2pubmed file as the size of the file is around 180 Mb. But then I am still getting the error that gene2accession file is partially transferred as that too is re-downloaded, even though it is already downloaded prior to running the command.
I don't really follow what you are saying. All you have to do is delete the NCBI.sqlite db and re-run the script exactly as you did above. If you say
rebuildCache = FALSE
you shouldn't download anything. And the error you got before didn't say anything about downloading files. It said that you were missing the gene2pubmed table.What I meant is even after deleting the NCBI.sqlite file, and re-running the script, an empty NCBI.sqlite file (0 kb) is created which is causing the error I mentioned:
preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz Error: no such table: main.gene2pubmed
I ensured any partially created NCBI.sqlite files are deleted, then downloaded the data directly from the NCBI, and then only re-ran the script. But the creation of this empty NCBI. sqlite file is causing the script to terminate. What do you suggest I do regarding this?
That's weird. I don't have any problem at all generating the
OrgDb
on my box. How big is the gene2pubmed.gz file? I get this:So just over 57M rows. I get fewer from the NCBI.sqlite file, but it's definitely there.
Hello James,
Thank you so much for suggesting ways to solve my query. After many trials, and waiting for good internet speed, I got the command to work and now successfully have my organism package built.
Regards,
Gayatri Brahmandam.
Hi, I have the same issue as you! Does this happen because of internet speed? I already downloaded 2 times these files!
That error indicates that you have a file called NCBI.sqlite in your working directory that is missing some tables. You should delete that file and then run
makeOrgPackageFromNCBI
again, using the same arguments. This will re-generate the correct NCBI.sqlite file and then create the package.