Can BSGenome forge from non-UCSC/NCBI assemblies?
Entering edit mode
Last seen 9 months ago


Not sure if I've missed something basic, but is it possible to run forgeBSgenomeDataPkg on assemblies which aren't available on NCBI or UCSC? I'm getting a warning which suggests they aren't. Is this the case, or have I done something wrong in my seed file?

> library(BSgenome)
> forgeBSgenomeDataPkg("R://Sophie/Pea_RNASeq/genomes/BSGenomes_seed_PisumSativum.txt")
Error in .make_Seqinfo_from_genome(genome) : 
  "Pisum_sativum_v1a" is not a registered NCBI assembly or UCSC genome (use
  registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or UCSC
  assemblies/genomes currently registered in the GenomeInfoDb package)
In addition: Warning messages:
1: In readLines(infile, n = 25000L) :
  incomplete final line found on 'R://Sophie/Pea_RNASeq/genomes/BSGenomes_seed_PisumSativum.txt'
2: In forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,  :
  field 'release_name' is deprecated

The seed file reads:

Package: BSgenome.Psativum.URGI.Pisum_sativum_v1a
Title: Full genome sequence for Pisum sativum (URGI; v1a)
Description: Full genome sequence for Pisum sativum (URGI; v1a) see
Version: 1a
organism: Pisum sativum
common_name: Pea
genome: Pisum_sativum_v1a
provider: URGI
provider_version: Pisum_sativum_v1a
release_date: Jan. 2019
release_name: Pisum sativum v1a
organism_biocview: Pisum_sativum
BSgenomeObjname: Psativum
SrcDataFiles: Split fasta file from (only Chr1-Chr7, no scaffolds)
seqs_srcdir: R://Sophie/Pea_RNASeq/genomes
seqnames: paste("chr",c(1:7))
BSgenome • 427 views
Entering edit mode
Last seen 14 hours ago
Seattle, WA, United States

Hi Sophie,

It's always possible to forge a BSgenome data package as long as you have access to the sequences (FASTA file(s) or 2bit file). However when the sequences are stored in a FASTA file it's highly recommended to write them to a 2bit file first, and then to use the 2bit file to forge the package. This process is also an opportunity to rename and/or reorder the sequences. Then in your seed file, you should not list the sequences (i.e. no seqnames entry) but you should make sure to list the circular sequences (circ_seqs entry). See Forging a BSGenome for an example and let me know here if you need further help with this.



Entering edit mode

I'm having the same problem - but this doesn't address the issue of NCBI/UCSC registered assemblies.

Entering edit mode

I understood what I was doing wrong, in the seed file need to specify:

circ_seqs: character(0)

and pass the 2bit file as seqfile:

seqs_srcdir: /path/to/2bit/directory
seqfile_name: file.2bit

Login before adding your answer.

Traffic: 280 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6