Question

Error during makeOrgPackageFromNCBI

0

Entering edit mode

gabciamm • 0

@gabciamm-13520

Last seen 8.5 years ago

Hi All,

I have error during package generation for sugar beet (and other organism also). How can I fix it. Please help!

makeOrgPackageFromNCBI(version = "0.1",
+                        author = "G_M <g@gmail.com>",
+                        maintainer = "G_M <g@gmail.com>",
+                        outputDir = ".",
+                        tax_id = "3555",
+                        genus = "Beta",
+                        species = "sugar_beet",
+                        NCBIFilesDir=getwd(),
+                        rebuildCache=FALSE)
preparing data from NCBI ...
starting download for 5 data files
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
Błąd w poleceniu '`[.data.frame`(data, setdiff(names(data), names(field_types)))': #Error
undefined columns selected
Dodatkowo: Komunikaty ostrzegawcze: ##additional wornings
1: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
2: call dbDisconnect() when finished working with a connection
3: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
4: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries

annotationforge makeorgpackagefromncbi • 2.2k views

ADD COMMENT • link 8.5 years ago gabciamm • 0

score 0 · Answer 1 · 2017-07-18

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

Seems OK to me. Are you using old packages?

> makeOrgPackageFromNCBI("0.0.1", "me@mine.org","me",".", "3555", "Beta","sugarbeet", rebuildCache = FALSE)
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Please be patient while we work out which organisms can be annotated
  with ensembl IDs.
making the OrgDb package ...
Populating genes table:
genes table filled
Populating pubmed table:
pubmed table filled
Populating chromosomes table:
chromosomes table filled
Populating gene_info table:
gene_info table filled
Populating entrez_genes table:
entrez_genes table filled
Populating alias table:
alias table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating go table:
go table filled
table metadata filled

'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
'select()' returned many:1 mapping between keys and columns
Populating go_all table:
go_all table filled
Creating package in ./org.Bsugarbeet.eg.db
Now deleting temporary database file
complete!
[1] "org.Bsugarbeet.eg.sqlite"
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-3.4.0/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-3.4.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] AnnotationForge_1.18.0 AnnotationDbi_1.38.1   IRanges_2.10.2        
[4] S4Vectors_0.14.2       Biobase_2.36.2         BiocGenerics_0.22.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11            GO.db_3.4.1             XML_3.98-1.7           
 [4] digest_0.6.12           bitops_1.0-6            GenomeInfoDb_1.12.1    
 [7] DBI_0.6-1               RSQLite_1.1-2           tools_3.4.0            
[10] biomaRt_2.32.0          RCurl_1.95-4.8          compiler_3.4.0         
[13] memoise_1.1.0           GenomeInfoDbData_0.99.0
>

ADD COMMENT • link 8.5 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James,

I have no idea what is wrong with packages I use - everything is up to date... :(. If it works on your side, is it possible you could send me somehow results files for s. beet (tax_id = 3555) and also for carrot (tax_id =79200). I would be very grateful.

Best,

Gabi

ADD REPLY • link 8.5 years ago gabciamm • 0

0

Entering edit mode

What's your sessionInfo()? If your packages are updated then you shouldn't be getting those warnings about dbFetch.

ADD REPLY • link 8.5 years ago James W. MacDonald 68k

0

Entering edit mode

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] AnnotationForge_1.16.1 AnnotationDbi_1.38.0 IRanges_2.8.2
[4] S4Vectors_0.12.2 Biobase_2.34.0 BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.12   XML_3.98-1.9   digest_0.6.12 bitops_1.0-6   DBI_0.7
[6] RSQLite_2.0    rlang_0.1.1    blob_1.1.0     bit64_0.9-7    RCurl_1.95-4.8
[11] bit_1.1-12     memoise_1.1.0 tibble_1.3.3

ADD REPLY • link 8.5 years ago gabciamm • 0

0

Entering edit mode

You are NOT using updated packages. You have an old version of R and Bioconductor. You need to update to R-3.4.1 and the current version of Bioconductor and try again.

ADD REPLY • link 8.5 years ago James W. MacDonald 68k

0

Entering edit mode

OK. I was sure I have all up to date because I have installed it last week... and try to update and the info was that everything is up to date. But I used Conda so maybe that's the reason... Thanks for help anyway.

ADD REPLY • link 8.5 years ago gabciamm • 0

0

Entering edit mode

Hi, now I have new R version and bioconductor and still doesn't work.

If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
Błąd w poleceniu '`[.data.frame`(data, setdiff(names(data), names(field_types)))':
undefined columns selected
Dodatkowo: Komunikaty ostrzegawcze:
1: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
2: call dbDisconnect() when finished working with a connection
3: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
4: W poleceniu 'rsqlite_fetch(res@ptr, n = n)':
Don't need to call dbFetch() for statements, only for queries
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/user/programy/R_3.4.1/R-3.4.1/lib/libRblas.so
LAPACK: /home/user/programy/R_3.4.1/R-3.4.1/lib/libRlapack.so

locale:
[1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] AnnotationForge_1.18.0 AnnotationDbi_1.38.1 IRanges_2.10.2
[4] S4Vectors_0.14.3 Biobase_2.36.2 BiocGenerics_0.22.0
[7] BiocInstaller_1.26.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.12    XML_3.98-1.9    digest_0.6.12   bitops_1.0-6
[5] DBI_0.7         RSQLite_2.0     rlang_0.1.1     blob_1.1.0
[9] tools_3.4.1     bit64_0.9-7     RCurl_1.95-4.8 bit_1.1-12
[13] compiler_3.4.1 pkgconfig_2.0.1 memoise_1.1.0   tibble_1.3.3

ADD REPLY • link 8.5 years ago gabciamm • 0

0

Entering edit mode

OK, NCBI has added an extra column to the gene_info file that we need to do something with. The quick fix is to do this at a terminal prompt

zcat gene_info.gz | cut -f 1-15 > gene_info
rm gene_info.gz
gzip gene_info

and then re-run makeOrgPackageFromNCBI. If it downloads the gene_info.gz file again, you have to let it download everything (or you could pre-emptively just download the files from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ first), then stop the process, cut that last line out and then restart.

I'll push a fix soon, but it will take a day or so to propagate to the download server.

ADD REPLY • link 8.5 years ago James W. MacDonald 68k

0

Entering edit mode

I have updated both the release and devel versions of AnnotationForge, and you should be able to get the updated package using biocLite in a day or two.