Hi,
there is a new issue since R version 3.4 with the package GenomicFeatures:
I created my own BSgenome package for Candida glabrata and loaded the GFF file with
txdb <- makeTxDbFromGFF(file=gffFile,dataSource="CGDB",
organism="Candida glabrata",format="gff3",
chrominfo=seqinfo_cg)
Error
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... Fehler in FUN(X[[i]], ...) :
1 unknown species: ‘Candida glabrata’ Please use 'available.species' to see viable species names or tax Ids
This issue does not occur in R version 3.3. Is there specific metadata required now ?
A few thoughts:
You could update to use the latest version of R (3.5) and Bioconductor (3.7)
You could utlize the
available.species()
function to see if the organism argument needs to be updated. Its possible we have tightened the constraint on the naming. Looking at theavailable.species()
would also allow you to use the taxonomyId argument rather than organism argument in themakeTxDbFromGFF
as well.The below code I was using R 3.5 and Bioc 3.7 so the values may be different when run in your R session
Thanks for your help. I will update my Bioconductor and try again. Just two questions:
1) Local BSgenomes (BSgenome.Cglabrata) could be used, which were not part of available.genomes() but of installed.genomes(). Why is this not possible anymore ?
2) How to get the source (uploader) of the available genomes e.g.
284593 Candida glabrata CBS 138
I think you might still be able to use the Local BSgenomes but I think you would have to specify a taxonomyId. The ERROR you are seeing is stemming from a lookup of the provided argument organism's taxonomyId - if the id is provided it skips this step.
This is the solution! Thanks maybe taxonomyId = NA should be documented under the error.
I'll look into trying to clarify the ERROR message.