Hi,
I am trying to run makeOrgPackageFromNCBI, as follows -
> makeOrgPackageFromNCBI(version="0.1", + author="otills <*******@*******>", + maintainer="otills <*******@*******>", + outputDir=".", + tax_id="582868", + genus="Mollusca", + species="Radix balthica")
However, after approx 30 mins I get the error -
Getting data for gene2pubmed.gz Loading required package: RCurl Loading required package: bitops extracting only data for our organism from : gene2pubmed Getting data for gene2accession.gz Error in sqliteSendQuery(con, statement, bind.data) : error in statement: duplicate column name: NA
I've tried re-running (including on different machines), but I get this error consistently. The file sizes (gene2accession.gz, gene2pubmed.gz and NCBI.sqlite) are always the same size at the time of crash.
Can anyone suggest what the problem might be?
Oli
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] RCurl_1.95-4.5 bitops_1.0-6 AnnotationForge_1.8.1 org.Hs.eg.db_3.0.0 RSQLite_1.0.0 [6] DBI_0.3.1 AnnotationDbi_1.28.1 GenomeInfoDb_1.2.4 IRanges_2.0.1 S4Vectors_0.4.0 [11] Biobase_2.26.0 BiocGenerics_0.12.1 BiocInstaller_1.16.1 loaded via a namespace (and not attached): [1] tools_3.1.2 >
Hi Oliver,
It turns out that NCBI has changed the format of the gene2accession and gene2refseq files by adding three additional columns. Since the code that parses these files expected three fewer columns, the result is as you see. After making some small changes in the underlying code in the Devel branch, I get things to work:
Also note that the genus is "Drosophila", and the species is "melanogaster", and the maintainer has to be something like "me <me@mine.org>" with the brackets and all that, or the package won't install correctly.
I'll send a patch to Marc Carlson, who is the maintainer for AnnotationForge, and hopefully we will get an updated version pushed in the next day or so.
In the meantime, if you are impatient, you can download the source package and change the function .primaryFiles() in NCBI_ftp.R to be like this:
Save that file, and then you should be able to do
at an R prompt, where AnnotationForge is in your working directory.