Search
Question: Forge a BSgenome data package
1
gravatar for jodiera
5 months ago by
jodiera0
jodiera0 wrote:

 

We work on the legume Chickpea which genome has been recently released but it does not have a BSgenome package so far. I though I'd try and make one. 

Following the vignette I have placed some FASTA chromosomes in a folder. I have started with a small example, so I have used just the first two chromosomes. I have two FASTA files named “Ca_chr1.fa” and “Ca_chr2.fa” located in 

/Users/Cicer/seqs/

 

Then I have created a seed file ("cicer") which it is also included into that folder. The seed file looks like this : 

Package: BSgenome.Carietinum.NCBI.ca1
Title: Cicer arietinum (Chickpea) full genome (NCBI version ASM33114v1)
Description: Cicer arietinum (Chickpea) full genome as provided by NCBI (ASM33114v1, Jan. 2013) and stored in Biostrings objects.
Version: 1.0
organism: Cicer arietinum
species: Chickpea
provider: NCBI
provider_version: ASM33114v1
release_date: Jan. 2013
release_name: BGI-Shenzhen ASM33114v1
source_url: https://www.ncbi.nlm.nih.gov/assembly/GCF_000331145.1/#/def
organism_biocview: Cicer_arietinum
BSgenomeObjname: Carietinum
seqnames: paste("Ca_", paste("chr", c(1:2), sep=""), sep="")
seqs_srcdir: /Users/Cicer/seqs/

After running `forgeBSgenomeDataPkg("path/to/cicer")` i get this error:

 

Error in makeS4FromList("BSgenomeDataPkgSeed", x) : 
  some names in 'x' are not valid BSgenomeDataPkgSeed slots (species)
In addition: Warning message:
In readLines(infile, n = 25000L) :
  incomplete final line found on './seqs/cicer'

Any suggestion it is greatly appreciated. Thanks.

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] BSgenome_1.38.0      rtracklayer_1.30.4   Biostrings_2.38.4    XVector_0.10.0      
[5] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.8        S4Vectors_0.8.11    
[9] BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
 [1] XML_3.98-1.5               Rsamtools_1.22.0           GenomicAlignments_1.6.3   
 [4] bitops_1.0-6               futile.options_1.0.0       zlibbioc_1.16.0           
 [7] futile.logger_1.4.3        lambda.r_1.1.9             BiocParallel_1.4.3        
[10] tools_3.2.3                Biobase_2.30.0             RCurl_1.95-4.8            
[13] SummarizedExperiment_1.0.2

 

 

 

 

 

ADD COMMENTlink modified 3 days ago • written 5 months ago by jodiera0
1
gravatar for James W. MacDonald
4 months ago by
United States
James W. MacDonald43k wrote:

The first issue has to do with your seed file. You have to follow the instructions in the vignette exactly, or you will get errors. And the error you get points to the problem

Error in makeS4FromList("BSgenomeDataPkgSeed", x) : 
  some names in 'x' are not valid BSgenomeDataPkgSeed slots (species)

That error can be interpreted as saying that you have this thing in your seed file called 'species' and it's not valid. And if you look at section 2.2 in the vignette, you will see that there is no 'species' field for the description file. If you add things that shouldn't be there, you will get errors.

The second error is 'incomplete final line'. That error means exactly what you might imagine - the final line in one or both of your files is incomplete. This can occur in any number of ways, I suppose, but the general idea is that you have truncated your file in such a way that the final line is missing the expected line ending.

ADD COMMENTlink written 4 months ago by James W. MacDonald43k

@Jim: the OP is using an old BioC (3.2). Maybe that's why s/he's putting species in his/her seed file.

@jodiera: We don't support old BioC versions. Please update to the current version (BioC 3.4, requires R 3.3). The current version of the BSgenome package is 1.42.0 and the BSgenomeForge vignette in it has been updated since BSgenome 1.38.0.

H.

ADD REPLYlink written 4 months ago by Hervé Pagès ♦♦ 12k

@Hervé Well maybe, but doesn't 2015-03-26 precede BioC 3.1? By my count the last time species was a valid name was in BioC 3.0...

------------------------------------------------------------------------
r101235 | hpages@fhcrc.org | 2015-03-26 13:33:49 -0700 (Thu, 26 Mar 2015) | 3 lines

Replace 'species' field with 'common_name' in BSgenome data package seed and
DESCRIPTION files.

 

ADD REPLYlink written 4 months ago by James W. MacDonald43k

You're right. I just noticed the OP was using an old BioC and remembered that species went away at some point, but was too lazy to find out exactly when that change was made. So s/he might actually have been looking at an even older version of the BSgenomeForge vignette.

Thanks for helping. Thumb up from me and I hope the OP will do the same :-)

ADD REPLYlink written 4 months ago by Hervé Pagès ♦♦ 12k
0
gravatar for jodiera
3 days ago by
jodiera0
jodiera0 wrote:

Thanks for the answers. This is exactly what was causing the problems. So, it worked when I : 

1) I followed exactly the latest updated vignette

2) Rebuilt the file

 

 

ADD COMMENTlink written 3 days ago by jodiera0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 356 users visited in the last hour