I run makeOrgPackageFromNCBI to create annotation package. the following files are download: [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz
Code should be placed in three backticks as shown below
(((makeOrgPackageFromNCBI(version = "0.1",
author = "Some One <so@someplace.org>",
maintainer = "Some One <so@someplace.org>",
outputDir = ".",
tax_id = "7137",
genus = "Galleria",
species = "Galleria mellonella",
rebuildCache = TRUE)))
# include your problematic code here with any corresponding output
Output;
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
extracting data for our organism from : gene_info
getting data for gene2go.gz
rebuilding the cache
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in function (type, msg, asError = TRUE) : FTP response timeout
In addition: Warning messages:
1: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
2: call dbDisconnect() when finished working with a connection
3: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
4: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
5: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
6: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
# please also include the results of running the following in an R session
sessionInfo( )
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] AnnotationForge_1.32.0 AnnotationDbi_1.52.0 IRanges_2.24.1
[4] S4Vectors_0.28.1 Biobase_2.50.0 BiocGenerics_0.36.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 XML_3.99-0.5 bitops_1.0-6
[4] DBI_1.1.1 RSQLite_2.2.3 cachem_1.0.4
[7] rlang_0.4.10 blob_1.2.1 vctrs_0.3.6
[10] tools_4.0.2 bit64_4.0.5 RCurl_1.98-1.2
[13] bit_4.0.4 fastmap_1.1.0 yaml_2.2.1
[16] compiler_4.0.2 pkgconfig_2.0.3 BiocManager_1.30.10
[19] memoise_2.0.0
>
Mine is still running after ~12 hours but is stalled on the 'processing GO data' step. There is a NCBI.sqlite file of ~32GB prepared, and all of the other typical files (gene2accession.gz, gene2go.gz, et cetera). I'll let you know if it ever finishes or returns a time-out error.
Okay, in my case, I ran out of memory, but I never received any FTP timeout error. So, it should finish eventually.