I am trying to make an organism db to use with GoSeq for pancium hallii. Using the biomart package I found what dataset to use but I get the error that 0 or more than 1 subdir is found. Looking online and ensemble plant I don't see multiple subdirectories. I am not sure how to fix the error.

> # make the organism db
> makeOrganismDbFromBiomart(biomart="plants_mart",
+                           dataset="phfil2_eg_gene",
+                           id_prefix="ensembl_",
+                           host="")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped)
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... Error in .Ensembl_getMySQLCoreDir(dataset, release = release, use.grch37 = use.grch37,  : 
  found 0 or more than 1 subdir for "phfil2_eg_gene" dataset at

> session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2021-06-16                  

The short answer is that you can't fix it. Way deep down in the bowels of the GenomicFeatures package is a function that parses the Ensembl ftp site directory structure and tries to get some data. It's a bit of a trick because there is an implicit assumption that the ftp directories will follow a particular naming paradigm, which can then be parsed in a clever way to match up with the first part of the dataset argument you provided (phfil2). Unfortunately, after parsing the directory structure, the dir for P hallii fil2 ends up being "phallii_fil2" instead of "phfil2". So when the matching attempt is made, obviously those two character strings are not the same so it fails. There is already a hard-coded workaround in that function to begin with:

.Ensembl_getMySQLCoreDir <- function (dataset, release = NA, use.grch37 = FALSE, kingdom = NA, 
    url = NA) 
    if ( 
        url <- ftp_url_to_Ensembl_mysql(release, use.grch37, 
    core_dirs <- Ensembl_listMySQLCoreDirs(release = release, 
        use.grch37 = use.grch37, kingdom = kingdom, url = url)
    trimmed_core_dirs <- sub("_core_.*$", "", core_dirs)
    shortnames <- sub("^(.)[^_]*_", "\\1", trimmed_core_dirs)           
    if (dataset == "mfuro_gene_ensembl") {
        shortname0 <- "mputorius_furo"          <-------------- hard-coded workaround for ferrets
    else {
        shortname0 <- strsplit(dataset, "_", fixed = TRUE)[[1L]][1L]
    core_dir <- core_dirs[shortnames == shortname0]
    if (length(core_dir) != 1L) 
        stop("found 0 or more than 1 subdir for \"", dataset, 
            "\" dataset at ", url)

I don't know what the best way to fix this might be - trying to do ad hoc things to fix problems tends not to work well in the long run - and maybe you don't really need this OrganismDbi package anyway. What exactly are you trying to do? I know it's 'Use goseq', but there's two possibilities there.

You could be trying to get the average transcript length, in which case you need either a TxDb or an EnsDb, which contain that information. Or you may need the gene ID -> GO mappings, in which case you could just use the biomaRt package directly.

So let us know which step you are trying to do and maybe we can come up with an alternative for you.


