Hi Bioconductor community,
I am trying to build an Org.db for a "not-so" model organism that lacks an entry in AnnotationHub(). For this I used makeOrgPackageFromNCBI, and although it took a very long time, it successfully downloaded the following files: [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz and further created the organism package (With namespace, description, zzz.R file, sqlite file). I was able to load the package it into R but checking the database I got an error.
The command columns(org.Pputida.eg.db) returns a reasonable table (?):
[1] "ALIAS" "ENTREZID" "EVIDENCE" "EVIDENCEALL" [5] "GENENAME" "GID" "GO" "GOALL" [9] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"
While I can find the GO accession numbers in the database, every other column returns a single entry of "NaN." For me it seems that something went wrong during the database generation process?
I hope someone can give a tip on how to proceed. Thanks a lot in advance.
Dissi Kratzl
makeOrgPackageFromNCBI(version = "0.1",
author = "Kratzl_Dissi <Dissikratzl@gmail.de>",
maintainer = "Kratzl_Dissi <Dissikratzl@gmail.de>",
outputDir = ".",
NCBIFilesDir = ".",
tax_id = "160488",
genus = "Pseudomonas",
species = "putida",
)
sessionInfo()
[1] LC_COLLATE=German_Germany.1252
[2] LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats4 stats graphics grDevices utils
[6] datasets methods base
other attached packages:
[1] org.Pputida.eg.db_0.1 AnnotationHub_3.2.2
[3] BiocFileCache_2.2.1 dbplyr_2.2.1
[5] clusterProfiler_4.2.2 devtools_2.4.5
[7] usethis_2.2.2 shiny_1.8.0
[9] biomaRt_2.50.3 GenomeInfoDb_1.30.1
[11] AnnotationForge_1.36.0 AnnotationDbi_1.56.2
[13] IRanges_2.28.0 S4Vectors_0.32.4
[15] Biobase_2.54.0 BiocGenerics_0.40.0
Thank you for your helpful answer! It makes sense that it doesn't work.
I think I cannot work with other strains, but I still tried your suggestion. Unfortunately, I encountered another error.... when I enter http://status.ensembl.org/, it says the server is down?
However, I guess I need to use makeOrgPackage() then? I can provide GO numbers, gene names, locus tags, RefSeq from other DB for my organism.
Thanks again!
Unfortunately the Biomart server can have connection issues, which is just a matter of waiting until it's available again.
And yes, if you have all the data, then you can just use
makeOrgPackage
.