Error MakeOrgPackagefromNCBI
1
0
Entering edit mode
@dfb033f2
Last seen 4 days ago
Finland

I need to create an orgDb for my microorganism, but it gives me an error that I'll report below:

>  > makeOrgPackageFromNCBI(version = "0.1",
> +                        author = "Cinzia Spagnoli cinzia.spagnoli@uniroma3.it",
> +                        maintainer = "Cinzia Spagnoli cinzia.spagnoli@uniroma3.it",
> +                        outputDir = ".",
> +                        tax_id = "575584",
> +                        genus = "Acinetobacter",
> +                        species = "baumannii")
>  If files are not cached locally this may take awhile to assemble a 33 
> GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads preparing data from NCBI ...
> starting download for
> [1] gene2pubmed.gz
> [2] gene2accession.gz
> [3] gene2refseq.gz
> [4] gene_info.gz
> [5] gene2go.gz
> getting data for gene2pubmed.gz
> extracting data for our organism from : gene2pubmed getting data for 
> gene2accession.gz extracting data for our organism from : 
> gene2accession getting data for gene2refseq.gz extracting data for our 
> organism from : gene2refseq getting data for gene_info.gz extracting 
> data for our organism from : gene_info getting data for gene2go.gz 
> extracting data for our organism from : gene2go processing gene2pubmed 
> processing gene_info: chromosomes processing gene_info: description 
> Error in prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache,  :
>   no information found for species with tax id 575584
Bioconductor • 307 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

You will rarely find a particular strain in any annotation data, and instead you should use the 'main' taxon ID, which for A. baumannii happens to be 470.

## how many genes for 470?
$ awk '$1 == 470' gene_info | wc -l
3733
## now how about 575584
$ awk '$1 == 575584' gene_info | wc -l
0

No idea how many genes one might expect for this bacterium, but you will get better results using 470.

0
Entering edit mode

I tried, but it does not seem to work.

> library(AnnotationForge)
> makeOrgPackageFromNCBI(version = "0.1",
+                          author = "Cinzia Spagnoli cinzia.spagnoli@uniroma3.it",
+                          maintainer = "Cinzia Spagnoli cinzia.spagnoli@uniroma3.it",
+                          outputDir = ".",
+                          tax_id = "470",
+                          genus = "Acinetobacter",
+                          species = "baumannii")
If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  error reading from the connection
In addition: Warning messages:
1: In .Internal(shortRowNames(x, type)) :
  closing unused connection 3 (D:/OneDrive - Universita degli Studi Roma Tre/Documenti/gene2pubmed.gz)
2: call dbDisconnect() when finished working with a connection 
3: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  invalid or incomplete compressed data
ADD REPLY
0
Entering edit mode

It might be due to either the spaces in your path, or the fact that it's a OneDrive directory. It's normally better to just use the Desktop and delete after installing.

> makeOrgPackageFromNCBI("0.0.1","me <me@mine.org>","me", tax_id = "470", genus = "Acinetobacter", species = "baumannii", rebuildCache = FALSE)
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
making the OrgDb package ...
Populating genes table:
genes table filled
Populating pubmed table:
pubmed table filled
Populating gene_info table:
gene_info table filled
Populating entrez_genes table:
entrez_genes table filled
Populating alias table:
alias table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating go table:
go table filled
table metadata filled

'select()' returned many:1
mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1
mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Creating package in c:/Users/jmacdon/Desktop/org.Abaumannii.eg.db 
Now deleting temporary database file
complete!
[1] "org.Abaumannii.eg.sqlite"

> install.packages("org.Abaumannii.eg.db", type = "source", repos = NULL)
Installing package into 'C:/Users/jmacdon/AppData/Local/R/win-library/4.3'
(as 'lib' is unspecified)
* installing *source* package 'org.Abaumannii.eg.db' ...
<snip>
* DONE (org.Abaumannii.eg.db)
> library(org.Abaumannii.eg.db)

> select(org.Abaumannii.eg.db, head(keys(org.Abaumannii.eg.db)), "SYMBOL")
'select()' returned 1:1 mapping
between keys and columns
       GID        SYMBOL
1 66395337          dnaA
2 66395338          dnaN
3 66395339          recF
4 66395340          gyrB
5 66395341          cybC
6 66395342 F3P16_RS00030

> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
ADD REPLY
0
Entering edit mode

I apologize for the delay in responding. However, the command still doesn't work for me. I would need to create the package from this genome: https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP058289.1/

ADD REPLY
0
Entering edit mode

I don't know what to tell you. I already told you that you can't build it for that strain, and you have to use 470 instead. I can get it to build (see above), and told you not to use a OneDrive path. Saying 'the command still doesn't work for me' without code or output isn't helpful at all (doesn't work how?).

Login before adding your answer.

Traffic: 684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6