Hi Marc and Others,
I am trying and learning to use makeOrgPackageFromNCBI() to make organism packages, but always encounter some problems during the process. Therefore, I really hope to get some suggestions and thank you a lot!
Please see the three detailed problems below(Maybe the problems are too many, but really hope to get some hints from you, thank you again.):
1> I run those functions in R version 3.3.1 and Windows 7.
2> As I have download those files needed for the function: gene2pubmed.gz, gene2accession.gz, gene2refseq.gz, gene_info.gz, gene2go.gz, NCBI.sqlite, idmapping_selected.tab.gz, the codes are shown below:
a. The first error-'error in statement: no such table: altGO_date'!
library(AnnotationForge)
library(AnnotationDbi)
library(GenomeInfoDb)
library(biomaRt)
makeOrgPackageFromNCBI(
version="0.1",
maintainer="Guido Hooiveld <guido.hooiveld@wur.nl>",
author="Guido Hooiveld <guido.hooiveld@wur.nl>",
outputDir=".",
tax_id='10029',
genus="Cricetulus",
species="griseus",
NCBIFilesDir = ".",
rebuildCache=F)
preparing data from NCBI ...
starting download for 5 data files
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: no such table: altGO_date
b. The second error. when I set rebuildCache=T, it occurs 'Error in file(description = tmp, open = "r") : object 'tmp' not found'!
makeOrgPackageFromNCBI(
version="0.1",
maintainer="Guido Hooiveld <guido.hooiveld@wur.nl>",
author="Guido Hooiveld <guido.hooiveld@wur.nl>",
outputDir=".",
tax_id='10029',
genus="Cricetulus",
species="griseus")
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for 5 data files
getting data for gene2pubmed.gz
Error in file(description = tmp, open = "r") : object 'tmp' not found
c. The third error. When I tried some other organism, it occured another problem-'Error in FUN(X[[i]], ...) : ?Please use 'available.species' to see viable species names or tax Ids'!
makeOrgPackageFromNCBI(version = "0.0.1",
author = "me",
maintainer = "me <me@mine.org>",
outputDir = ".",
tax_id = '7227',
genus = "Drosophila",
species = 'Drosophila melanogaster',
NCBIFilesDir = ".",
rebuildCache=F)
or
makeOrgPackageFromNCBI(version = "0.0.1",
author = "me",
maintainer = "me <me@mine.org>",
outputDir = ".",
tax_id = '7227',
genus = "Drosophila",
species = 'melanogaster',
NCBIFilesDir = ".",
rebuildCache=F)
Both of them show the same problem:
preparing data from NCBI ...
starting download for 6 data files
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene2unigene
extracting data for our organism from : gene2unigene
getting all data for our organism from : gene2unigene
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Loading required package: httr
Attaching package: httr?
The following object is masked from package:Biobase?
content
Loading required package: RCurl
Loading required package: bitops
Error in FUN(X[[i]], ...) :
?Please use 'available.species' to see viable species names or tax Ids
Here are my session informations:
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.28.0 GenomeInfoDb_1.8.3 AnnotationForge_1.14.2 AnnotationDbi_1.34.4
[5] IRanges_2.6.1 S4Vectors_0.10.2 Biobase_2.32.0 BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] rsconnect_0.4.3 DBI_0.4-1 tools_3.3.1 RCurl_1.95-4.8 RSQLite_1.0.0 bitops_1.0-6
[7] XML_3.98-1.4
Looking forward to your response~~
Thanks,
Shisheng

If you want to respond, use the ADD COMMENT button and type in the box that comes up. If you use the Add your answer box, it looks like you are answering your own question, which you are not doing.
As Marc pointed out, you can simply use the OrgDb on AnnotationHub.
> library(AnnotationHub) > hub <- AnnotationHub() updating metadata: retrieving 1 resource |======================================================================| 100% snapshotDate(): 2016-07-20 > query(hub, c("OrgDb","Drosophila melanogaster")) AnnotationHub with 1 record # snapshotDate(): 2016-07-20 # names(): AH49581 # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ # $species: Drosophila melanogaster # $rdataclass: OrgDb # $title: org.Dm.eg.db.sqlite # $description: NCBI gene ID based annotations about Drosophila melanogaster # $taxonomyid: 7227 # $genome: NCBI genomes # $sourcetype: NCBI/ensembl # $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p... # $sourcelastmodifieddate: NA # $sourcesize: NA # $tags: NCBI, Gene, Annotation # retrieve record with 'object[["AH49581"]]' > dm <- hub[["AH49581"]] downloading from 'https://annotationhub.bioconductor.org/fetch/56311' retrieving 1 resource |======================================================================| 100% > dm OrgDb object: | DBSCHEMAVERSION: 2.1 | Db type: OrgDb | Supporting package: AnnotationDbi | DBSCHEMA: FLY_DB | ORGANISM: Drosophila melanogaster | SPECIES: Fly | EGSOURCEDATE: 2015-Aug11 | EGSOURCENAME: Entrez Gene | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | CENTRALID: EG | TAXID: 7227 | GOSOURCENAME: Gene Ontology | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/ | GOSOURCEDATE: 20150808 | GOEGSOURCEDATE: 2015-Aug11 | GOEGSOURCENAME: Entrez Gene | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | KEGGSOURCENAME: KEGG GENOME | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes | KEGGSOURCEDATE: 2011-Mar15 | GPSOURCENAME: UCSC Genome Bioinformatics (Drosophila melanogaster) | GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/dm6 | GPSOURCEDATE: 2014-Dec12 | FBSOURCEDATE: -Jan08 | FBSOURCENAME: Flybase | FBSOURCEURL: ftp://ftp.flybase.net/releases/current/precomputed_files/genes/ | ENSOURCEDATE: 2015-Jul16 | ENSOURCENAME: Ensembl | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta Please see: help('select') for usage informationAs to your error, the help page for that function says
genus: Single string indicating the genus.species: Single string indicating the species.And the species in this situation is "melanogaster", not "Drosophila melanogaster", which is the genus and species.
Hi James,
Thank you for your warning and answer. This is my first time to use the website Bioconductor, I will note that next time!
For my question, first, I know I could find the Drosophila OrgDb by 'AnnotationHub', I just want to try and learn the function 'makeOrgPackageFromNCBI' to see whether it works in my computer^_^;
Second, I have seen the help page for that function and tried to only use 'melanogaster' for the 'species', but nothing could help, it still occured the same problem. What's more, I checked the 'available.species':
> spec <- available.species()
> spec[which(as.numeric(spec$taxon)==7227),]
taxon species
10836 7227 Drosophila melanogaster
As you can see, it shows me the 'species'-'Drosophila melanogaster'. I even tried 'Drosophila_melanogaster' or 'Drosophilamelanogaster', but the problem is always there;
Third, 'Drosophila melanogaster' is just an example, which is not my objective organism. As the above, I posted three problems for trying different examples (none of them is my studying object) in my computer, I just want to learn this awesome function for my future research.
Therefore, I really need your help to fix the three problems of the function 'makeOrgPackageFromNCBI' in my computer. Please do not advise me to give up the function...
Thank you quite a lot^_^
Shisheng
Well, you don't say what this mysterious species is, but if I assume it's Cricetulus griseus, then
> makeOrgPackageFromNCBI("0.0.1", "me@mine.org", "me",tax_id="10029", genus="Cricetulus",species="griseus") If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI ... starting download for 5 data files getting data for gene2pubmed.gz rebuilding the cache Loading required package: RCurl Loading required package: bitops extracting data for our organism from : gene2pubmed getting data for gene2accession.gz rebuilding the cache extracting data for our organism from : gene2accession getting data for gene2refseq.gz rebuilding the cache extracting data for our organism from : gene2refseq getting data for gene_info.gz rebuilding the cache extracting data for our organism from : gene_info getting data for gene2go.gz rebuilding the cache extracting data for our organism from : gene2go processing gene2pubmed processing gene_info: chromosomes processing gene_info: description processing alias data processing refseq data processing accession data processing GO data Loading required package: biomaRt Loading required package: httr Attaching package: httr The following object is masked from package:Biobase : content Please be patient while we work out which organisms can be annotated with ensembl IDs. making the OrgDb package ... Loading required package: RSQLite Loading required package: DBI Populating genes table: genes table filled Populating pubmed table: pubmed table filled Populating chromosomes table: chromosomes table filled Populating gene_info table: gene_info table filled Populating entrez_genes table: entrez_genes table filled Populating alias table: alias table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating go table: go table filled table metadata filled Loading required package: GO.db 'select()' returned many:1 mapping between keys and columns Dropping GO IDs that are too new for the current GO.db Populating go table: go table filled 'select()' returned many:1 mapping between keys and columns Populating go_all table: go_all table filled Creating package in /misc/jmacdon/org.Cgriseus.eg.db Now deleting temporary database file complete! [1] "org.Cgriseus.eg.sqlite" > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GO.db_3.3.0 RSQLite_1.0.0 DBI_0.4-1 [4] httr_1.2.1 biomaRt_2.28.0 RCurl_1.95-4.8 [7] bitops_1.0-6 AnnotationForge_1.14.2 AnnotationDbi_1.34.4 [10] IRanges_2.6.1 S4Vectors_0.10.2 Biobase_2.32.0 [13] BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] XML_3.98-1.4 GenomeInfoDb_1.8.3 R6_2.1.2 tools_3.3.0 [5] compiler_3.3.0 > library(AnnotationHub) Attaching package: AnnotationHub The following object is masked from package:Biobase : cacheOR, as Marc already pointed out, there are literally (yes, literally!) thousands of species in the AnnotationHub, this one being represented twice.
> hub <- AnnotationHub() updating metadata: retrieving 1 resource |======================================================================| 100% snapshotDate(): 2016-07-20 > grep(hub, c("OrgDb","Cricetulus griseus")) Error in as.character.default(pattern) : no method for coercing this S4 class to a vector > query(hub, c("OrgDb","Cricetulus griseus")) AnnotationHub with 2 records # snapshotDate(): 2016-07-20 # $dataprovider: NCBI, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ # $species: Cricetulus griseus # $rdataclass: OrgDb # additional mcols(): taxonomyid, genome, description, tags, sourceurl, # sourcetype # retrieve records with, e.g., 'object[["AH12820"]]' title AH12820 | org.Cricetulus_griseus.eg.sqlite AH48061 | org.Cricetulus_griseus.eg.sqlite >Hi James,
So strange for the problem. Well, it is OK for your computer, not for mine. And could you help me check whether there is something for my studying object by using 'makeOrgPackageFromNCBI' function ? --- 'Mycoplasma hyopneumoniae 168-L', one very rare species:
> spec <- available.species()
> spec[which(as.numeric(spec$taxon)==1116211),]
taxon species
1031039 1116211 Mycoplasma hyopneumoniae 168-L
I have checked it in AnnotationHub package, it showed no records:
library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2016-07-20
> query(hub, c("OrgDb","Mycoplasma hyopneumoniae 168-L"))
AnnotationHub with 0 records
# snapshotDate(): 2016-07-20
Many thanks,
Shisheng
You won't be able to build an OrgDb package for a species that isn't in NCBI's databases:
Hi James,
For the AnnotationHub package, how could I choose the object? For example:
> query(hub, c("OrgDb","Solanum lycopersicum"))
AnnotationHub with 2 records
# snapshotDate(): 2016-07-20
# $dataprovider: NCBI, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Solanum lycopersicum
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description, tags, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH13359"]]'
title
AH13359 | org.Solanum_lycopersicum.eg.sqlite
AH48047 | org.Solanum_lycopersicum.eg.sqlite
It shows me two results: "AH13359" and "AH48047", does it mean "Solanum lycopersicum" has two sub-taxonomies? how could I recogniaze these two? which one should I choose?
Thanks a lot,
Shisheng
Hi James,
Sorry to come back to this rather old thread, but I got stuck when I tried to reproduce your run of the
makeOrgPackageFromNCBI()function. See below. FYI: I tried to build an OrgDb for Chinese Hamster myselves rather than to use the AnnotationHub because I would like to see the differences between a 'fresh' OrgDb and a slightly dated version (triggered by C: Org.db: why a supposed unique key (ID) has multiple entries?).I got this error:
1st of all: I don't know why 'Ailuropoda' is returned, since I didn't provide this argument..??
Anyway, I checked for CH using
available.species()and I found two entries having the same TaxID:> spec <- available.species() > spec[grepl('griseus',spec$species),] taxon species <<snip>> 13907 10029 Cricetulus barabensis griseus 13908 10029 Cricetulus griseus <<snip>> >After viewing the source code of GenomeInfoDb I noticed the error message is triggered by calling the internal function
.getTaxonomyId(). When I manually run this function I noticed this goes wrong because an "NA" is returned, which then results in printing of the error message.> data(speciesMap, package="GenomeInfoDb") > species="griseus" > species <- gsub(" {2,}", " ", species) > species <- gsub(",", " ", species, fixed=TRUE) > idx <- match(species, speciesMap$species) > idx [1] NA >As far as I can understand this is caused by the fact that the taxID 10029 thus matches with 2 descriptors/synonyms (but maybe I am completely wrong!)
Any suggestions on how to get it working? :)
Thanks,
Guido
> > library(AnnotationForge) > library(AnnotationDbi) > library(GenomeInfoDb) > > makeOrgPackageFromNCBI("0.0.1", "guido.hooiveld@wur.nl", "Guido Hooiveld",tax_id="10029", genus="Cricetulus",species="griseus", rebuildCache=FALSE) preparing data from NCBI ... starting download for 5 data files getting data for gene2pubmed.gz extracting data for our organism from : gene2pubmed getting data for gene2accession.gz extracting data for our organism from : gene2accession getting data for gene2refseq.gz extracting data for our organism from : gene2refseq getting data for gene_info.gz extracting data for our organism from : gene_info getting data for gene2go.gz extracting data for our organism from : gene2go processing gene2pubmed processing gene_info: chromosomes processing gene_info: description processing alias data processing refseq data processing accession data processing GO data Loading required package: biomaRt Loading required package: httr Attaching package: ‘httr’ The following object is masked from ‘package:Biobase’: content Loading required package: RCurl Loading required package: bitops Error in FUN(X[[i]], ...) : 1 unknown species: ‘Ailuropoda melanoleuca ’ Please use 'available.species' to see viable species names or tax Ids > > dir() [1] "gene_info.gz" "gene2accession.gz" [3] "gene2go.gz" "gene2pubmed.gz" [5] "gene2refseq.gz" "idmapping_selected.tab.gz" [7] "NCBI.sqlite" >separate post because of character limit other post.
Update: using a fresh R-session the above-mentioned error persists, even when running makeOrgPackageFromNCBI() without explicitly specifying genus and species. Also strange error on ‘Ailuropoda melanoleuca' still is returned...??
> library(AnnotationForge) > library(AnnotationDbi) > library(GenomeInfoDb) > makeOrgPackageFromNCBI("0.0.1", "guido.hooiveld@wur.nl", "Guido Hooiveld",tax_id="10029", rebuildCache=FALSE) preparing data from NCBI ... starting download for 5 data files getting data for gene2pubmed.gz extracting data for our organism from : gene2pubmed getting data for gene2accession.gz extracting data for our organism from : gene2accession getting data for gene2refseq.gz extracting data for our organism from : gene2refseq getting data for gene_info.gz extracting data for our organism from : gene_info getting data for gene2go.gz extracting data for our organism from : gene2go processing gene2pubmed processing gene_info: chromosomes processing gene_info: description processing alias data processing refseq data processing accession data processing GO data Loading required package: biomaRt Loading required package: httr Attaching package: ‘httr’ The following object is masked from ‘package:Biobase’: content Loading required package: RCurl Loading required package: bitops Error in FUN(X[[i]], ...) : 1 unknown species: ‘Ailuropoda melanoleuca ’ Please use 'available.species' to see viable species names or tax Ids >> makeOrgPackageFromNCBI("0.0.1", "me","me@mine.org", ".", "10029", "Cricetulus","griseus") If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI ... starting download for 5 data files getting data for gene2pubmed.gz rebuilding the cache Loading required package: RCurl Loading required package: bitops extracting data for our organism from : gene2pubmed getting data for gene2accession.gz rebuilding the cache extracting data for our organism from : gene2accession getting data for gene2refseq.gz rebuilding the cache extracting data for our organism from : gene2refseq getting data for gene_info.gz rebuilding the cache extracting data for our organism from : gene_info getting data for gene2go.gz rebuilding the cache extracting data for our organism from : gene2go processing gene2pubmed processing gene_info: chromosomes processing gene_info: description processing alias data processing refseq data processing accession data processing GO data Loading required package: biomaRt Loading required package: httr Attaching package: ‘httr’ The following object is masked from ‘package:Biobase’: content Please be patient while we work out which organisms can be annotated with ensembl IDs. making the OrgDb package ... Loading required package: RSQLite Loading required package: DBI Populating genes table: genes table filled Populating pubmed table: pubmed table filled Populating chromosomes table: chromosomes table filled Populating gene_info table: gene_info table filled Populating entrez_genes table: entrez_genes table filled Populating alias table: alias table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating go table: go table filled table metadata filled Loading required package: GO.db 'select()' returned many:1 mapping between keys and columns Dropping GO IDs that are too new for the current GO.db Populating go table: go table filled 'select()' returned many:1 mapping between keys and columns Populating go_all table: go_all table filled Creating package in ./org.Cgriseus.eg.db Now deleting temporary database file complete! [1] "org.Cgriseus.eg.sqlite" > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GO.db_3.3.0 RSQLite_1.0.0 DBI_0.4-1 [4] httr_1.2.1 biomaRt_2.28.0 RCurl_1.95-4.8 [7] bitops_1.0-6 AnnotationForge_1.14.2 AnnotationDbi_1.34.4 [10] IRanges_2.6.1 S4Vectors_0.10.2 Biobase_2.32.0 [13] BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] XML_3.98-1.4 GenomeInfoDb_1.8.3 R6_2.1.2 tools_3.3.0I think on Windows this regular expression https://github.com/Bioconductor-mirror/AnnotationForge/blob/master/R/NCBI_ftp.R#L1498 doesn't strip the '\r' from the end-of-line, so the species has a trailing '\r' -- hence the odd formatting of the 'unknown species' error. Maybe there are other issues, and traceback() after the error would help.
I deleted all files and cache, and started a new session. I copied the last code from James (except for name & email) but few hrs later I still got the same error.... I now also noted that the OP reported the same error. Apparently this seems to be specific for the Windows platform...
Below also the output of traceback().
> makeOrgPackageFromNCBI("0.0.1", "Guido Hooiveld","guido.hooiveld@wur.nl", ".", "10029", "Cricetulus","griseus") Error in FUN(X[[i]], ...) : 1 unknown species: ‘Ailuropoda melanoleuca ’ Please use 'available.species' to see viable species names or tax Ids > traceback() 16: stop(sum(is.na(idx)), " unknown species: ", paste(sQuote(head(species[is.na(idx)])), "Please use 'available.species' to see viable species names or tax Ids", collapse = " ")) 15: FUN(X[[i]], ...) 14: lapply(species, .getTaxonomyId) 13: lapply(species, .getTaxonomyId) 12: unlist(lapply(species, .getTaxonomyId)) 11: FUN(X[[i]], ...) 10: lapply(specNames, GenomeInfoDb:::.taxonomyId) 9: lapply(specNames, GenomeInfoDb:::.taxonomyId) 8: unlist(lapply(specNames, GenomeInfoDb:::.taxonomyId)) 7: getFastaSpeciesDirs() 6: available.FastaEnsemblSpecies() 5: available.ensembl.datasets() 4: tax_id %in% names(available.ensembl.datasets()) 3: prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache, verbose) 2: NEW_makeOrgPackageFromNCBI(version, maintainer, author, outputDir, tax_id, genus, species, NCBIFilesDir, databaseOnly, rebuildCache = rebuildCache, verbose = verbose) 1: makeOrgPackageFromNCBI("0.0.1", "Guido Hooiveld", "guido.hooiveld@wur.nl", ".", "10029", "Cricetulus", "griseus") >Another update; yes!, it (almost) worked...
Triggered by Martin's comment I downloaded the source code of AnnotationForge, and modified line 1498 slightly by adding "
\r":listing<- strsplit(listing, "\r\n")[[1]][In addition, I noticed that in line 1492 (here) the ENSEMBL database release is hard-coded/set to be version 80. Since the current version is v85 (see here at the bottom of FTP page), i changed that to 85 (
getFastaSpeciesDirs <- function(release=85){), but I don't think this caused the error I experienced. Nevertheless, may be good to have this set automagically to the latest version by usingftp://ftp.ensembl.org/pub/current_mysql?].I then installed from source, and reran
makeOrgPackageFromNCBI(). Building the OrgDb works fine now. :) However, I installing it did not work yet... (any suggestions on that?"Error : Invalid DESCRIPTION file. Malformed maintainer field.")So, the problem of failure to build the OrgDb on a Windowns machine seems to be solved by adding
\r. Whether this change has any impact in Linux I don't know....> > makeOrgPackageFromNCBI("0.0.1", "Guido Hooiveld","guido.hooiveld@wur.nl", ".", "10029", "Cricetulus","griseus") If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI ... starting download for 5 data files <<snip>> processing GO data Loading required package: RCurl Loading required package: bitops Loading required package: biomaRt Loading required package: httr Attaching package: ‘httr’ The following object is masked from ‘package:Biobase’: content Please be patient while we work out which organisms can be annotated with ensembl IDs. making the OrgDb package ... Loading required package: RSQLite Loading required package: DBI Populating genes table: genes table filled Populating pubmed table: pubmed table filled Populating chromosomes table: chromosomes table filled Populating gene_info table: gene_info table filled Populating entrez_genes table: entrez_genes table filled Populating alias table: alias table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating go table: go table filled table metadata filled Loading required package: GO.db 'select()' returned many:1 mapping between keys and columns Dropping GO IDs that are too new for the current GO.db Populating go table: go table filled 'select()' returned many:1 mapping between keys and columns Populating go_all table: go_all table filled Creating package in ./org.Cgriseus.eg.db Now deleting temporary database file complete! [1] "org.Cgriseus.eg.sqlite" Warning message: In file.remove(dbFileName) : cannot remove file './org.Cgriseus.eg.sqlite', reason 'Permission denied' > > install.packages(pkgs="org.Cgriseus.eg.db", repos = NULL, type="source") * installing *source* package 'org.Cgriseus.eg.db' ... Error : Invalid DESCRIPTION file Malformed maintainer field. See section 'The DESCRIPTION file' in the 'Writing R Extensions' manual. ERROR: installing package DESCRIPTION failed for package 'org.Cgriseus.eg.db' * removing 'C:/Program Files/R/R-3.3.1patched/library/org.Cgriseus.eg.db' Warning messages: 1: running command '"C:/PROGRA~1/R/R-33~1.1PA/bin/x64/R" CMD INSTALL -l "C:\Program Files\R\R-3.3.1patched\library" "org.Cgriseus.eg.db"' had status 1 2: In install.packages(pkgs = "org.Cgriseus.eg.db", repos = NULL, type = "source") : installation of package ‘org.Cgriseus.eg.db’ had non-zero exit statusHere is the error:
Which seems pretty self explanatory, and a quick look at the argument positions
> args(makeOrgPackageFromNCBI) function (version, maintainer, author, outputDir = getwd(), tax_id, genus = NULL, species = NULL, NCBIFilesDir = getwd(), databaseOnly = FALSE, useDeprecatedStyle = FALSE, rebuildCache = TRUE, verbose = TRUE)Should have allowed you to self-diagnose.
Thanks James, quite obvious indeed...
All working now:
@ Martin: I noticed you had already fixed the 'end-of-line' issue; would it also be an idea to change the ENSEMBL ftp address, so the latest release will always be used? See my comment above C: problem with makeOrgPackageFromNCBI when making an annotation package.