makeOrgPackageFromNCBI generates an empty database?
1
0
Entering edit mode
@7337162e
Last seen 12 weeks ago
Germany

Hi Bioconductor community,

I am trying to build an Org.db for a "not-so" model organism that lacks an entry in AnnotationHub(). For this I used makeOrgPackageFromNCBI, and although it took a very long time, it successfully downloaded the following files: [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz and further created the organism package (With namespace, description, zzz.R file, sqlite file). I was able to load the package it into R but checking the database I got an error.

The command columns(org.Pputida.eg.db) returns a reasonable table (?):

[1] "ALIAS" "ENTREZID" "EVIDENCE" "EVIDENCEALL" [5] "GENENAME" "GID" "GO" "GOALL" [9] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"

While I can find the GO accession numbers in the database, every other column returns a single entry of "NaN." For me it seems that something went wrong during the database generation process?

I hope someone can give a tip on how to proceed. Thanks a lot in advance.

Dissi Kratzl


makeOrgPackageFromNCBI(version = "0.1",
                       author = "Kratzl_Dissi <Dissikratzl@gmail.de>",
                       maintainer = "Kratzl_Dissi <Dissikratzl@gmail.de>",
                       outputDir = ".",
                       NCBIFilesDir = ".",
                       tax_id = "160488",
                       genus = "Pseudomonas",
                       species = "putida", 

                       )


 sessionInfo()
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
 [1] org.Pputida.eg.db_0.1  AnnotationHub_3.2.2   
 [3] BiocFileCache_2.2.1    dbplyr_2.2.1          
 [5] clusterProfiler_4.2.2  devtools_2.4.5        
 [7] usethis_2.2.2          shiny_1.8.0           
 [9] biomaRt_2.50.3         GenomeInfoDb_1.30.1   
[11] AnnotationForge_1.36.0 AnnotationDbi_1.56.2  
[13] IRanges_2.28.0         S4Vectors_0.32.4      
[15] Biobase_2.54.0         BiocGenerics_0.40.0
OrganismDbi makeOrgPackageFromNCBI • 327 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

After you download all those files, the next step is to generate an omnibus SQLite database that is used to create the OrgDb package. This SQLite file is called NCBI.sqlite. If I query the one I have in hand, I get this:

> library(RSQLite)
> con <- dbConnect(SQLite(), "NCBI.sqlite")
> dbGetQuery(con, "select * from gene_info where tax_id='160488';")
  tax_id gene_id   symbol locus_tag
1 160488 2830333 NEWENTRY         -
  synonyms dbXrefs chromosome
1        -       -          -
  map_location
1            -
                                                                                                                            description
1 Record to support submission of GeneRIFs for a gene not in Gene (Pseudomonas putida (strain KT2440); Pseudomonas putida str. KT2440).
  gene_type nomenclature_symbol
1     other                   -
  nomenclature_name
1                 -
  nomenclature_status
1                   -
  other_designations
1                  -
  modification_date feature_type
1          20230125            -

Which indicates that there is only a placeholder for this particular strain. But there are other strains that do have genes, such as Pseudomonas putida NBRC 14164

> dbGetQuery(con, "select count(*) from gene_info where tax_id='1211579';")
  count(*)
1     5556

If that strain is close enough, you can make an OrgDb package by using its taxonomic ID instead. Do note that by default you will do the whole download/create step if it's been 24 hours since you ran the code last. It's completely unnecessary to do that (and boring besides), so you should A) use the same working directory that has the existing NCBI.sqlite file in it, and B) include a rebuildCache = FALSE in your call to makeOrgDbFromNCBI. In that scenario you will just query the DB to get the data you need and it should not take as much time.

0
Entering edit mode

Thank you for your helpful answer! It makes sense that it doesn't work.

I think I cannot work with other strains, but I still tried your suggestion. Unfortunately, I encountered another error.... when I enter http://status.ensembl.org/, it says the server is down?

preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Please be patient while we work out which
  organisms can be annotated with ensembl IDs.
Ensembl site unresponsive, trying asia mirror
Ensembl site unresponsive, trying uswest mirror
Fehler in h(simpleError(msg, call)) : 
  Fehler bei der Auswertung des Argumentes 'table' bei der Methodenauswahl für Funktion '%in%': schannel: next InitializeSecurityContext failed: SEC_E_CERT_EXPIRED (0x80090328) - Das empfangene Zertifikat ist abgelaufen.

However, I guess I need to use makeOrgPackage() then? I can provide GO numbers, gene names, locus tags, RefSeq from other DB for my organism.

Thanks again!

ADD REPLY
1
Entering edit mode

Unfortunately the Biomart server can have connection issues, which is just a matter of waiting until it's available again.

And yes, if you have all the data, then you can just use makeOrgPackage.

ADD REPLY

Login before adding your answer.

Traffic: 429 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6