Error with makeOrganismDbFromBiomart
1
0
Entering edit mode
Catherine • 0
@e86206d2
Last seen 3.4 years ago
United States

I am trying to make an organism db to use with GoSeq for pancium hallii. Using the biomart package I found what dataset to use but I get the error that 0 or more than 1 subdir is found. Looking online and ensemble plant I don't see multiple subdirectories. I am not sure how to fix the error.

> # make the organism db
> makeOrganismDbFromBiomart(biomart="plants_mart",
+                           dataset="phfil2_eg_gene",
+                           id_prefix="ensembl_",
+                           host="plants.ensembl.org")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped)
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... Error in .Ensembl_getMySQLCoreDir(dataset, release = release, use.grch37 = use.grch37,  : 
  found 0 or more than 1 subdir for "phfil2_eg_gene" dataset at ftp://ftp.ensemblgenomes.org/pub/plants/release-51/mysql/

> session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2021-06-16                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version    date       lib source        
 annotate               1.70.0     2021-05-19 [1] Bioconductor  
 AnnotationDbi        * 1.54.1     2021-06-08 [1] Bioconductor  
 AnnotationForge        1.34.0     2021-05-19 [1] Bioconductor  
 apeglm               * 1.14.0     2021-05-19 [1] Bioconductor  
 ash                    1.0-15     2015-09-01 [1] CRAN (R 4.1.0)
 assertthat             0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 backports              1.2.1      2020-12-09 [1] CRAN (R 4.1.0)
 base64url              1.4        2018-05-14 [1] CRAN (R 4.1.0)
 batchtools             0.9.15     2021-01-11 [1] CRAN (R 4.1.0)
 bbmle                  1.0.23.1   2020-02-03 [1] CRAN (R 4.1.0)
 bdsmatrix              1.3-4      2020-01-13 [1] CRAN (R 4.1.0)
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.1.0)
 Biobase              * 2.52.0     2021-05-19 [1] Bioconductor  
 BiocFileCache          2.0.0      2021-05-19 [1] Bioconductor  
 BiocGenerics         * 0.38.0     2021-05-19 [1] Bioconductor  
 BiocIO                 1.2.0      2021-05-19 [1] Bioconductor  
 BiocManager            1.30.16    2021-06-15 [1] CRAN (R 4.1.0)
 BiocParallel         * 1.26.0     2021-05-19 [1] Bioconductor  
 biomaRt              * 2.48.1     2021-06-08 [1] Bioconductor  
 Biostrings           * 2.60.1     2021-06-06 [1] Bioconductor  
 bit                    4.0.4      2020-08-04 [1] CRAN (R 4.1.0)
 bit64                  4.0.5      2020-08-30 [1] CRAN (R 4.1.0)
 bitops                 1.0-7      2021-04-24 [1] CRAN (R 4.1.0)
 blob                   1.2.1      2020-01-20 [1] CRAN (R 4.1.0)
 brew                   1.0-6      2011-04-13 [1] CRAN (R 4.1.0)
 BSgenome               1.60.0     2021-05-19 [1] Bioconductor  
 cachem                 1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
 callr                  3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 Category               2.58.0     2021-05-19 [1] Bioconductor  
 checkmate              2.0.0      2020-02-06 [1] CRAN (R 4.1.0)
 cli                    2.5.0      2021-04-26 [1] CRAN (R 4.1.0)
 coda                   0.19-4     2020-09-30 [1] CRAN (R 4.1.0)
 colorspace             2.0-1      2021-05-04 [1] CRAN (R 4.1.0)
 crayon                 1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
 curl                   4.3.1      2021-04-30 [1] CRAN (R 4.1.0)
 data.table             1.14.0     2021-02-21 [1] CRAN (R 4.1.0)
 DBI                    1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr                 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray           0.18.0     2021-05-19 [1] Bioconductor  
 desc                   1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
 DESeq2               * 1.32.0     2021-05-19 [1] Bioconductor  
 devtools             * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)
 digest                 0.6.27     2020-10-24 [1] CRAN (R 4.1.0)
 DOT                    0.1        2016-04-16 [1] CRAN (R 4.1.0)
 dplyr                * 1.0.6      2021-05-05 [1] CRAN (R 4.1.0)
 edgeR                  3.34.0     2021-05-19 [1] Bioconductor  
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 emdbook                1.3.12     2020-02-19 [1] CRAN (R 4.1.0)
 EnhancedVolcano      * 1.10.0     2021-05-19 [1] Bioconductor  
 extrafont              0.17       2014-12-08 [1] CRAN (R 4.1.0)
 extrafontdb            1.0        2012-06-11 [1] CRAN (R 4.1.0)
 fansi                  0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
 fastmap                1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 filelock               1.0.2      2018-10-05 [1] CRAN (R 4.1.0)
 formatR                1.11       2021-06-01 [1] CRAN (R 4.1.0)
 fs                     1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
 futile.logger        * 1.4.3      2016-07-10 [1] CRAN (R 4.1.0)
 futile.options         1.0.1      2018-04-20 [1] CRAN (R 4.1.0)
 genefilter           * 1.74.0     2021-05-19 [1] Bioconductor  
 geneplotter            1.70.0     2021-05-19 [1] Bioconductor  
 generics               0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.28.0     2021-05-19 [1] Bioconductor  
 GenomeInfoDbData       1.2.6      2021-06-16 [1] Bioconductor  
 GenomicAlignments    * 1.28.0     2021-05-19 [1] Bioconductor  
 GenomicFeatures      * 1.44.0     2021-05-19 [1] Bioconductor  
 GenomicRanges        * 1.44.0     2021-05-19 [1] Bioconductor  
 ggalt                  0.4.0      2017-02-15 [1] CRAN (R 4.1.0)
 ggbeeswarm           * 0.6.0      2017-08-07 [1] CRAN (R 4.1.0)
 ggdendro             * 0.1.22     2020-09-13 [1] CRAN (R 4.1.0)
 ggplot2              * 3.3.4      2021-06-16 [1] CRAN (R 4.1.0)
 ggrastr                0.2.3      2021-03-01 [1] CRAN (R 4.1.0)
 ggrepel              * 0.9.1      2021-01-15 [1] CRAN (R 4.1.0)
 glue                   1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
 GO.db                  3.13.0     2021-06-16 [1] Bioconductor  
 GOplot               * 1.0.2      2016-03-30 [1] CRAN (R 4.1.0)
 GOstats                2.58.0     2021-05-19 [1] Bioconductor  
 graph                  1.70.0     2021-05-19 [1] Bioconductor  
 gridExtra            * 2.3        2017-09-09 [1] CRAN (R 4.1.0)
 GSEABase               1.54.0     2021-05-19 [1] Bioconductor  
 gtable                 0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 hms                    1.1.0      2021-05-17 [1] CRAN (R 4.1.0)
 htmltools            * 0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)
 httr                   1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
 hwriter                1.3.2      2014-09-10 [1] CRAN (R 4.1.0)
 IRanges              * 2.26.0     2021-05-19 [1] Bioconductor  
 jpeg                   0.1-8.1    2019-10-24 [1] CRAN (R 4.1.0)
 jsonlite               1.7.2      2020-12-09 [1] CRAN (R 4.1.0)
 KEGGREST               1.32.0     2021-05-19 [1] Bioconductor  
 KernSmooth             2.23-20    2021-05-03 [1] CRAN (R 4.1.0)
 lambda.r               1.2.4      2019-09-18 [1] CRAN (R 4.1.0)
 lattice                0.20-44    2021-05-02 [1] CRAN (R 4.1.0)
 latticeExtra           0.6-29     2019-12-19 [1] CRAN (R 4.1.0)
 lifecycle              1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
 limma                * 3.48.0     2021-05-19 [1] Bioconductor  
 locfit                 1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)
 magrittr               2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
 maps                   3.3.0      2018-04-03 [1] CRAN (R 4.1.0)
 MASS                   7.3-54     2021-05-03 [1] CRAN (R 4.1.0)
 Matrix                 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)
 MatrixGenerics       * 1.4.0      2021-05-19 [1] Bioconductor  
 matrixStats          * 0.59.0     2021-06-01 [1] CRAN (R 4.1.0)
 memoise                2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
 munsell                0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 mvtnorm                1.1-2      2021-06-07 [1] CRAN (R 4.1.0)
 numDeriv               2016.8-1.1 2019-06-06 [1] CRAN (R 4.1.0)
 OrganismDbi          * 1.34.0     2021-05-19 [1] Bioconductor  
 pheatmap             * 1.0.12     2019-01-04 [1] CRAN (R 4.1.0)
 pillar                 1.6.1      2021-05-16 [1] CRAN (R 4.1.0)
 pkgbuild               1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
 plyr                   1.8.6      2020-03-03 [1] CRAN (R 4.1.0)
 png                    0.1-7      2013-12-03 [1] CRAN (R 4.1.0)
 PoiClaClu            * 1.0.2.1    2019-01-04 [1] CRAN (R 4.1.0)
 prettyunits            1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx               3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
 progress               1.2.2      2019-05-16 [1] CRAN (R 4.1.0)
 proj4                  1.0-10.1   2021-01-26 [1] CRAN (R 4.1.0)
 ps                     1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr                  0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R6                     2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
 rappdirs               0.3.3      2021-01-31 [1] CRAN (R 4.1.0)
 RBGL                   1.68.0     2021-05-19 [1] Bioconductor  
 RColorBrewer         * 1.1-2      2014-12-07 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.6      2021-01-15 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)
 remotes                2.4.0      2021-06-02 [1] CRAN (R 4.1.0)
 restfulr               0.0.13     2017-08-06 [1] CRAN (R 4.1.0)
 Rgraphviz              2.36.0     2021-05-19 [1] Bioconductor  
 rjson                  0.2.20     2018-06-08 [1] CRAN (R 4.1.0)
 rlang                  0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot              2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 Rsamtools            * 2.8.0      2021-05-19 [1] Bioconductor  
 RSQLite                2.2.7      2021-04-22 [1] CRAN (R 4.1.0)
 rsvg                   2.1.2      2021-05-03 [1] CRAN (R 4.1.0)
 rtracklayer            1.52.0     2021-05-19 [1] Bioconductor  
 Rttf2pt1               1.3.8      2020-01-10 [1] CRAN (R 4.1.0)
 S4Vectors            * 0.30.0     2021-05-19 [1] Bioconductor  
 scales                 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 sessioninfo            1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
 ShortRead            * 1.50.0     2021-05-19 [1] Bioconductor  
 stringi                1.6.2      2021-05-17 [1] CRAN (R 4.1.0)
 stringr                1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
 SummarizedExperiment * 1.22.0     2021-05-19 [1] Bioconductor  
 survival               3.2-11     2021-04-26 [1] CRAN (R 4.1.0)
 systemPipeR          * 1.26.2     2021-05-27 [1] Bioconductor  
 testthat               3.0.2      2021-02-14 [1] CRAN (R 4.1.0)
 tibble                 3.1.2      2021-05-16 [1] CRAN (R 4.1.0)
 tidyselect             1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
 usethis              * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
 utf8                   1.2.1      2021-03-12 [1] CRAN (R 4.1.0)
 V8                     3.4.2      2021-05-01 [1] CRAN (R 4.1.0)
 VariantAnnotation      1.38.0     2021-05-19 [1] Bioconductor  
 vctrs                  0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 VennDiagram          * 1.6.20     2018-03-28 [1] CRAN (R 4.1.0)
 vipor                  0.4.5      2017-03-22 [1] CRAN (R 4.1.0)
 withr                  2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
 XML                    3.99-0.6   2021-03-16 [1] CRAN (R 4.1.0)
 xml2                   1.3.2      2020-04-23 [1] CRAN (R 4.1.0)
 xtable                 1.8-4      2019-04-21 [1] CRAN (R 4.1.0)
 XVector              * 0.32.0     2021-05-19 [1] Bioconductor  
 yaml                   2.2.1      2020-02-01 [1] CRAN (R 4.1.0)
 zlibbioc               1.38.0     2021-05-19 [1] Bioconductor
FunctionalAnnotation • 901 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 24 minutes ago
United States

The short answer is that you can't fix it. Way deep down in the bowels of the GenomicFeatures package is a function that parses the Ensembl ftp site directory structure and tries to get some data. It's a bit of a trick because there is an implicit assumption that the ftp directories will follow a particular naming paradigm, which can then be parsed in a clever way to match up with the first part of the dataset argument you provided (phfil2). Unfortunately, after parsing the directory structure, the dir for P hallii fil2 ends up being "phallii_fil2" instead of "phfil2". So when the matching attempt is made, obviously those two character strings are not the same so it fails. There is already a hard-coded workaround in that function to begin with:

.Ensembl_getMySQLCoreDir <- function (dataset, release = NA, use.grch37 = FALSE, kingdom = NA, 
    url = NA) 
{
    if (is.na(url)) 
        url <- ftp_url_to_Ensembl_mysql(release, use.grch37, 
            kingdom)
    core_dirs <- Ensembl_listMySQLCoreDirs(release = release, 
        use.grch37 = use.grch37, kingdom = kingdom, url = url)
    trimmed_core_dirs <- sub("_core_.*$", "", core_dirs)
    shortnames <- sub("^(.)[^_]*_", "\\1", trimmed_core_dirs)           
    if (dataset == "mfuro_gene_ensembl") {
        shortname0 <- "mputorius_furo"          <-------------- hard-coded workaround for ferrets
    }
    else {
        shortname0 <- strsplit(dataset, "_", fixed = TRUE)[[1L]][1L]
    }
    core_dir <- core_dirs[shortnames == shortname0]
    if (length(core_dir) != 1L) 
        stop("found 0 or more than 1 subdir for \"", dataset, 
            "\" dataset at ", url)
    core_dir
}

I don't know what the best way to fix this might be - trying to do ad hoc things to fix problems tends not to work well in the long run - and maybe you don't really need this OrganismDbi package anyway. What exactly are you trying to do? I know it's 'Use goseq', but there's two possibilities there.

You could be trying to get the average transcript length, in which case you need either a TxDb or an EnsDb, which contain that information. Or you may need the gene ID -> GO mappings, in which case you could just use the biomaRt package directly.

So let us know which step you are trying to do and maybe we can come up with an alternative for you.

ADD COMMENT

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6