How to construct a BSgenome object from a custom genome assembly using BSgenomeForge?
1
0
Entering edit mode
mat149 ▴ 80
@mat149-11450
Last seen 10 days ago
United States

Hi,

I have a custom genome assembly contained in .fasta format. Sequences for each of the chromosomes are compiled into a single .fasta file. I'd like to forge a BSgenome object for this genome assembly (not publicly available) I've looked through the vignette and I'm stuck on how to construct the seed object. I know how to generate a BSgenome object using forgeBSgenomeDataPkgFromNCBI(), but I'm stuck on how to construct a BSgenome object from a genome assembly without an accession. Can anyone guide me through the next steps?

Thanks!

setwd("C:\\...\\Desktop\\wd\\dump\\seqs_srcdir")

e122<-readDNAStringSet("E122.fasta", format="fasta",use.names=TRUE)

chroms<-Seqinfo(
    seqnames=c("chrA","chrB","chrC","chrD","chrE","chrF","chrMT"),
    seqlengths=width(e122),
    isCircular=as.logical(c("FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","TRUE"),
    genome=c("E122","E122","E122","E122","E122","E122","E122")))

chroms
Seqinfo object with 7 sequences (1 circular) from an unspecified genome:
  seqnames seqlengths isCircular genome
  chrA        2326463      FALSE   <NA>
  chrB        3127439      FALSE   <NA>
  chrC        3366774      FALSE   <NA>
  chrD        3699297      FALSE   <NA>
  chrE        4325132      FALSE   <NA>
  chrF        4126616      FALSE   <NA>
  chrMT         47890       TRUE   <NA>
BSgenome BSgenomeForge • 260 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States

There is an example in the help page for forgeBSgenomePkgFromTwobitFile. Have you tried emulating that?

0
Entering edit mode

I've tried the following, but I must be missing something:

setwd("C:\\Users\\....\\wd\\dump\\seqs_srcdir")

e122<-readDNAStringSet("E122.fasta", format="fasta",use.names=TRUE)
    # export(e122, "e122.2bit", format = "2bit")

chroms<-Seqinfo(
    seqnames=c("A","B","C","D","E","F","MT"),
    seqlengths=width(e122),
    isCircular=as.logical(c("FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","TRUE"),
    genome=c("E122","E122","E122","E122","E122","E122","E122")))

my_file <- read.dcf("seed.txt", fields = NULL, all = FALSE, keep.white = NULL)
write.dcf(my_file , file = "seed.dcf", append = FALSE, useBytes = FALSE, indent = 0.1 * getOption("width"), width = 0.9 * getOption("width"), keep.white = NULL)
forgeBSgenomeDataPkg("seed.dcf")

**Error in .make_Seqinfo_from_genome(genome) : 
  "e122.2bit" is not a registered NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or
  registered_UCSC_genomes() to list the NCBI or UCSC assemblies/genomes currently registered in the GenomeInfoDb
  package)

This is my seed.dcf file:

Package: BSgenome.YLE122.2bit
Title: Custom genome for YL
Description: Full genome sequences for YL.
Version: 1.0.0
Author: MT <mt3@gmail.com>
Maintainer: MT <mt3@gmail.com>
organism: YL
common_name: YL
provider: Custom
provider_version: 1
source_url: C:/Users/.../wd/dump/seqs_srcdir
seqs_srcdir: C:/Users/.../wd/dump/seqs_srcdir/e122.2bit
organism_biocview: Y_L
release_date: 2025-09-15
seqnames: c("A","B","C","D","E","F","MT")
genome: e122.2bit
ondisk_seq_format: 2bit
ADD REPLY
1
Entering edit mode

You are. The reason I pointed you to forgeBSgenomeDataPkgFromTwobitFile is because it's much simpler, doesn't require a seed file, and is the preferred way of making a BSgenome package. In addition, you don't appear to be setting your seed file up correctly anyway. But since the function I have pointed you to only requires the 2bit file, and is easier to use, I am not going to try to figure out where you went wrong, and instead will point you to the new function instead.

ADD REPLY

Login before adding your answer.

Traffic: 1241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6