Entering edit mode
Hi!
I am trying to use makeOrgPackageFromNCBI() to build my own organism annotation package for Acinetobacter baumannii ACICU (taxid: 405416), but I got the following error "Error in prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache, : no information found for species with tax id 405416".
> library(AnnotationForge)
> makeOrgPackageFromNCBI(version = "0.1",
+ author = "Irene <myemail@xxx.it>",
+ maintainer = "Irene <myemail@xxx.it>",
+ outputDir = ".",
+ tax_id = "405416",
+ genus = "Acinetobacter",
+ species = "baumannii")
If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
Error in prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache, :
no information found for species with tax id 405416
I'd appreciate any feedback!
Thanks in advance,
Irene
So, how can I perform a GSEA analysis if I have gene names expressed as "locus_tag" from the NCBI Genbank file, and I cannot download the database of that specific strain but only that of Acinetobacter baumannii?
Thanks!
That's a tough one. What you need are mappings from those locus tags to whatever ontology you want to use (GO or KEGG, presumably). Unfortunately, what NCBI appears to have are Gene IDs for the species (although they say it's strain K09-14? I know nothing about all the various species for this bacterium.). Anyway, there appears to be some infomation about taxid 470 in the data downloads.
The GO mappings will come from a file downloaded from UniProt, and searching on their site for that species brings up results for multiple different strains. But if you use 470 as the taxid, there seem to be quite a few mappings. When you run
makeOrgPackageFromNCBI
it will download all the files, parse them, and put the data in a SQLite database called 'NCBI.sqlite'. If you re-run that function and specify rebuildCache = FALSE, you will simply re-use that SQLite database (which is what you should do!). Anyway, there are lots of GO mappings:The only remaining trick is to map whatever ID you have to something that will be in the resulting
orgDb
. I don't know what a locus_tag is, but hopefully it's a GenBank or RefSeq tag that you can match to NCBI IDs.