Hey there,
I'm having some trouble getting makeOrgPackageFromNCBI()
to work. As you can see from the output below, it downloads the data from NCBI just fine, then fails when trying to access BioMart. When I click that BioMart link in the error message, there is just a blank webpage with the text 0.7 on it - same as when I tried a week ago. Any ideas? Based on other threads for similar errors, the problem is either that BioMart is down all the time, or my university has a firewall (I've never had trouble with anything else though). SessionInfo is below.
Also, can someone please tell me what is going to be in the Org.db once I finally make it? It's unclear why the message thinks the database will be exactly 12GB - surely that depends on how much info there is for the organism in question? I checked the help files (e.g. ?`OrgDb-class`), and they don't seem to say what's in the db.
Cheers!
Luke
AnnotationForge::makeOrgPackageFromNCBI(
version="0.1",
maintainer = "Luke",
author = "Luke",
outputDir=getwd(),
tax_id = "7460",
genus="Apis",
species="mellifera",
NCBIFilesDir=getwd(),
databaseOnly=FALSE,
useDeprecatedStyle=FALSE,
rebuildCache=TRUE,
verbose=TRUE)
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene2unigene
[5] gene_info.gz
[6] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene2unigene
rebuilding the cache
extracting data for our organism from : gene2unigene
getting all data for our organism from : gene2unigene
getting data for gene_info.gz
rebuilding the cache
extracting data for our organism from : gene_info
getting data for gene2go.gz
rebuilding the cache
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Please be patient while we work out which organisms can be annotated with ensembl IDs.
Request to BioMart web service failed.
The BioMart web service you're accessing may be down.
Check the following URL and see if this website is available:
http://www.ensembl.org:80/biomart/martservice?type=version&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
Error in if (BioMartVersion == "\n" | BioMartVersion == "") { :
argument is of length zero
In addition: Warning messages:
1: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
2: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
3: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
4: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
5: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
6: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
7: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
8: In result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_0.5.0 bindrcpp_0.2.2 clusterProfiler_3.8.1 kableExtra_0.9.0 pander_0.6.2 sva_3.28.0 BiocParallel_1.14.2 genefilter_1.62.0
[9] mgcv_1.8-24 nlme_3.1-137 MuMIn_1.42.1 ecodist_2.0.1 gplots_3.0.1 ggjoy_0.4.1 ggridges_0.5.0 RColorBrewer_1.1-2
[17] gridExtra_2.3 ggdendro_0.1-20 ggrepel_0.8.0 ggplot2_3.0.0 stringr_1.3.1 tidyr_0.8.1 dplyr_0.7.6 reshape2_1.4.3
[25] RSQLite_2.1.1 WGCNA_1.63 fastcluster_1.1.25 dynamicTreeCut_1.63-1 GOstats_2.46.0 Category_2.46.0 Matrix_1.2-14 biomaRt_2.36.1
[33] GSEABase_1.42.0 graph_1.58.0 annotate_1.58.0 XML_3.98-1.12 AnnotationDbi_1.42.1 IRanges_2.14.10 S4Vectors_0.18.3 Biobase_2.40.0
[41] BiocGenerics_0.26.0
loaded via a namespace (and not attached):
[1] backports_1.1.2 Hmisc_4.1-1 fastmatch_1.1-0 plyr_1.8.4 igraph_1.2.1 lazyeval_0.2.1 splines_3.5.1 GenomeInfoDb_1.16.0
[9] robust_0.4-18 digest_0.6.15 foreach_1.4.4 htmltools_0.3.6 GOSemSim_2.6.0 viridis_0.5.1 GO.db_3.6.0 fansi_0.2.3
[17] gdata_2.18.0 magrittr_1.5 checkmate_1.8.5 memoise_1.1.0 fit.models_0.5-14 cluster_2.0.7-1 doParallel_1.0.11 limma_3.36.2
[25] readr_1.1.1 matrixStats_0.54.0 enrichplot_1.0.2 prettyunits_1.0.2 colorspace_1.3-2 rvest_0.3.2 blob_1.1.1 rrcov_1.4-4
[33] crayon_1.3.4 RCurl_1.95-4.11 bindr_0.1.1 impute_1.54.0 survival_2.42-6 iterators_1.0.10 glue_1.3.0 gtable_0.2.0
[41] UpSetR_1.3.3 Rgraphviz_2.24.0 DEoptimR_1.0-8 DOSE_3.6.1 mvtnorm_1.0-8 DBI_1.0.0 Rcpp_0.12.18 viridisLite_0.3.0
[49] xtable_1.8-2 progress_1.2.0 htmlTable_1.12 units_0.6-0 foreign_0.8-71 bit_1.1-14 preprocessCore_1.42.0 Formula_1.2-3
[57] AnnotationForge_1.22.1 htmlwidgets_1.2 httr_1.3.1 fgsea_1.6.0 acepack_1.4.1 pkgconfig_2.0.1 nnet_7.3-12 dbplyr_1.2.2
[65] utf8_1.1.4 labeling_0.3 tidyselect_0.2.4 rlang_0.2.1 munsell_0.5.0 tools_3.5.1 cli_1.0.0 evaluate_0.11
[73] yaml_2.2.0 knitr_1.20 bit64_0.9-7 robustbase_0.93-1.1 caTools_1.17.1.1 purrr_0.2.5 ggraph_1.0.2 RBGL_1.56.0
[81] xml2_1.2.0 DO.db_2.9 compiler_3.5.1 rstudioapi_0.7 tibble_1.4.2 tweenr_0.1.5 pcaPP_1.9-73 stringi_1.2.4
[89] highr_0.7 lattice_0.20-35 pillar_1.3.0 data.table_1.11.4 cowplot_0.9.3 bitops_1.0-6 qvalue_2.12.0 R6_2.2.2
[97] latticeExtra_0.6-28 KernSmooth_2.23-15 codetools_0.2-15 MASS_7.3-50 gtools_3.8.1 assertthat_0.2.0 rprojroot_1.3-2 withr_2.1.2
[105] GenomeInfoDbData_1.1.0 hms_0.4.2 rpart_4.1-13 rmarkdown_1.10 rvcheck_0.1.0 ggforce_0.1.3 base64enc_0.1-3
>
Great, thanks for the help! I'll have a look into it. Some ideas to improve the help file for makeOrgPackageFromNCBI:
- Explain that it is possible to search for available annotation packages using a separate package (AnnotationHub). I thought there were only 20 pre-built packages for the 'classic' organisms (human, mouse, fly, zebrafish, etc...), because I read that somewhere. If there are secretly dozens more pre-built ones for all sorts of random species, that'd be good to mention (perhaps in the vignette for this function: https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationForge/inst/doc/MakingNewOrganismPackages.html)
- Add a more informative error message, that explains what you just explained to me (e.g. "Maybe the function is choosing the wrong default, try manually specifying metazoa.ensembl.org or whatever"). You could also add an optional argument to the function to specify which type of organism it is, hiding the internal workings from users who just want to get their database working.