Custom BSgenome with makeTxDbFromGFF
0
0
Entering edit mode
@michaelweber1-11392
Last seen 3.1 years ago

Hi,

there is a new issue since R version 3.4 with the package GenomicFeatures:

I created my own BSgenome package for Candida glabrata and loaded the GFF file with

txdb <- makeTxDbFromGFF(file=gffFile,dataSource="CGDB",

                        organism="Candida glabrata",format="gff3",

                        chrominfo=seqinfo_cg)

 

Error

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... Fehler in FUN(X[[i]], ...) :
  1 unknown species: ‘Candida glabrata’ Please use 'available.species' to see viable species names or tax Ids

This issue does not occur in R version 3.3. Is there specific metadata required now ?

 

bsgenome genomicfeatures • 574 views
ADD COMMENT
0
Entering edit mode

A few thoughts:

  1. You could update to use the latest version of R (3.5) and Bioconductor (3.7)

  2. You could utlize the available.species() function to see if the organism argument needs to be updated. Its possible we have tightened the constraint on the naming. Looking at the available.species() would also allow you to use the taxonomyId argument rather than organism argument in the makeTxDbFromGFF as well.

The below code I was using R 3.5 and Bioc 3.7 so the values may be different when run in your R session

> temp = available.species() 
> dx = intersect(grep(temp$genus, pattern="Candida"), grep(temp$species, pattern="glabrata")) 
> temp[dx,]
         tax_id     genus                species
8017       5478 [Candida]               glabrata
288262   284593   Candida       glabrata CBS 138
451541   444776   Candida    cf. glabrata PT1-21
1091876 1231519   Candida    glabrata CAB52-4041
1164536 1308524   Candida       glabrata ADL-340
1164537 1308525   Candida       glabrata ADL-341
1164538 1308526   Candida       glabrata ADL-342
1164539 1308527   Candida       glabrata ADL-343
1164540 1308528   Candida       glabrata ADL-344
1164541 1308529   Candida       glabrata ADL-345
1244219 1398155   Candida       glabrata M202019
1249324 1403403   Candida        glabrata UOB301
1252760 1406948   Candida glabrata CCTCC M202019
ADD REPLY
0
Entering edit mode

Thanks for your help. I will update my Bioconductor and try again. Just two questions:
1) Local BSgenomes (BSgenome.Cglabrata) could be used, which were not part of available.genomes() but of installed.genomes(). Why is this not possible anymore ?

2) How to get the source (uploader) of the available genomes e.g. 284593 Candida glabrata CBS 138

 

 

ADD REPLY
0
Entering edit mode

I think you might still be able to use the Local BSgenomes but I think you would have to specify a taxonomyId.  The ERROR you are seeing is stemming from a lookup of the provided argument organism's taxonomyId - if the id is provided it skips this step.  

ADD REPLY
0
Entering edit mode

This is the solution! Thanks maybe taxonomyId = NA should be documented under the error.

ADD REPLY
0
Entering edit mode

I'll look into trying to clarify the ERROR message.   

ADD REPLY

Login before adding your answer.

Traffic: 210 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6