MakeOrgPackageFromNCBI SQL error
2
0
Entering edit mode
chefer • 0
@chefer-23518
Last seen 4.0 years ago

I am having problems getting makeOrgPackageFromNCBI to complete. Note that this downloads about 12Gb of data from the NCBI. There is an SQL error when uploading/parsing the GO terms.

Any help would be appreciated.

Here is the code I run:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("AnnotationForge")
BiocManager::install("biomaRt")
BiocManager::install("GenomeInfoDbData")
BiocManager::install("GO.db")

library(AnnotationForge)
library(biomaRt)

tx_id <- "568703"
makeOrgPackageFromNCBI(version = "0.1",
                       author = "placeholder@domain",
                       maintainer = "placeholder@domain",
                       outputDir = ".",
                       tax_id = tx_id,
                       genus = "Lactobacullis",
                       species = "rhamnosus")

And the error (eventually):

processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in bmRequest(request = request, verbose = verbose) :
  Internal Server Error (HTTP 500).
In addition: Warning messages:
1: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
2: call dbDisconnect() when finished working with a connection
3: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().

And my session. I tried it with R4.0 as well, and get a similar error message.

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /bifo/itmp/lactobacillus_rhamnosus_gg/LactobacillusRhamnosusGG/NCBI/R-env/lib/libopenblasp-r0.3.9.so

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] biomaRt_2.42.1         AnnotationForge_1.28.0 AnnotationDbi_1.48.0
[4] IRanges_2.20.2         S4Vectors_0.24.4       Biobase_2.46.0
[7] BiocGenerics_0.32.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6           GenomeInfoDb_1.22.1    compiler_3.6.3
 [4] pillar_1.4.4           dbplyr_1.4.3           prettyunits_1.1.1
 [7] bitops_1.0-6           tools_3.6.3            progress_1.2.2
[10] digest_0.6.25          bit_1.1-15.2           RSQLite_2.2.0
[13] memoise_1.1.0          BiocFileCache_1.10.2   tibble_3.0.1
[16] lifecycle_0.2.0        pkgconfig_2.0.3        rlang_0.4.6
[19] DBI_1.1.0              curl_4.3               GenomeInfoDbData_1.2.2
[22] dplyr_0.8.5            stringr_1.4.0          httr_1.4.1
[25] rappdirs_0.3.1         vctrs_0.3.0            askpass_1.1
[28] hms_0.5.3              tidyselect_1.1.0       bit64_0.9-7
[31] glue_1.4.0             R6_2.4.1               XML_3.99-0.3
[34] purrr_0.3.4            blob_1.2.1             magrittr_1.5
[37] ellipsis_0.3.0         assertthat_0.2.1       stringi_1.4.6
[40] RCurl_1.98-1.2         openssl_1.4.1          crayon_1.3.4
MakeOrgPackageFromNCBI NCBI SQL AnnotationForge • 990 views
ADD COMMENT
1
Entering edit mode
chefer • 0
@chefer-23518
Last seen 4.0 years ago

I have found this post on a related manner, and updated the AnnotationDbi package, and the issue was resolved. For completeness I did:

BiocManager::install("remotes") #not sure if needed
BiocManager::install("jmacdon/AnnotationDbi")

and then run the commands above to produce the database.

ADD COMMENT
0
Entering edit mode

That's an orthogonal issue, but I'm happy it worked for you.

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

That's actually an error from the biomaRt package, at the point where the function is trying to map NCBI Gene IDs to Ensembl IDs. Being an online resource (and of late rather inconsistently available), it's possible to get errors like that, and given the amount of time to build one of these things it's irritating to boot, particularly since there aren't any Ensembl mappings. Howeva:

tx_id <- "568703"
makeOrgPackageFromNCBI(version = "0.1",
                       author = "placeholder@domain",
                       maintainer = "placeholder@domain",
                       outputDir = ".",
                       tax_id = tx_id,
                       genus = "Lactobacullis",
                       species = "rhamnosus")
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
 preparing data from NCBI ...
 starting download for
 [1] gene2pubmed.gz
 [2] gene2accession.gz
 [3] gene2refseq.gz
 [4] gene_info.gz
 [5] gene2go.gz
 getting data for gene2pubmed.gz
 rebuilding the cache
 extracting data for our organism from : gene2pubmed
 getting data for gene2accession.gz
 rebuilding the cache
 extracting data for our organism from : gene2accession
 getting data for gene2refseq.gz
 rebuilding the cache
 extracting data for our organism from : gene2refseq
 getting data for gene_info.gz
 rebuilding the cache
 extracting data for our organism from : gene_info
 getting data for gene2go.gz
 rebuilding the cache
 extracting data for our organism from : gene2go
 processing gene2pubmed
 processing gene_info: chromosomes
 processing gene_info: description
 processing alias data
 processing refseq data
 processing accession data
 processing GO data
 Please be patient while we work out which organisms can be annotated
   with ensembl IDs.
 making the OrgDb package ...
 Populating genes table:
 genes table filled
 Populating pubmed table:
 pubmed table filled
 Populating gene_info table:
 gene_info table filled
 Populating entrez_genes table:
 entrez_genes table filled
 Populating alias table:
 alias table filled
 Populating refseq table:
 refseq table filled
 Populating accessions table:
 accessions table filled
 Populating go table:
 go table filled
 table metadata filled

 'select()' returned many:1 mapping between keys and columns
 Dropping GO IDs that are too new for the current GO.db
 Populating go table:
go table filled
 Populating go_bp table:
 go_bp table filled
 Populating go_cc table:
 go_cc table filled
 Populating go_mf table:
 go_mf table filled
 'select()' returned many:1 mapping between keys and columns
 Populating go_bp_all table:
 go_bp_all table filled
 Populating go_cc_all table:
 go_cc_all table filled
 Populating go_mf_all table:
 go_mf_all table filled
 Populating go_all table:
 go_all table filled
 Creating package in ./org.Lrhamnosus.eg.db
 Now deleting temporary database file
 complete!
 [1] "org.Lrhamnosus.eg.sqlite"
 There were 50 or more warnings (use warnings() to see the first 50)
 >

But do note that your maintainer field won't work.

> install.packages("org.Lrhamnosus.eg.db/", repos = NULL)
 * installing *source* package ‘org.Lrhamnosus.eg.db’ ...
 ** using staged installation
 Error : Invalid DESCRIPTION file

 Malformed maintainer field.

I have fixed the maintainer field and can get the package to install. If you want the package, send me an email. jmacdon at uw dot edu and I'll get it to you.

ADD COMMENT
0
Entering edit mode

Thanks so much, I fixed the maintainer field manually after the fact. Will contact you regarding the package in any case. For completeness, the fixed maintainer field id below:

makeOrgPackageFromNCBI(version = "0.1",
                        author = "placeholder@domain",
                        maintainer = "Name Surname <placeholder@domain>",
                        outputDir = ".",
                        tax_id = tx_id,
                        genus = "Lactobacullis",
                        species = "rhamnosus")
ADD REPLY

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6