I am working with MEDIPS bioconductor. I was able to successfully create a custom BEE genome (I know its already there in MEDIPS, but for some reason I wanted to make a different one myself).
The genome folder consists of 16 separate fasta files for each chromosomes named like chr1, chr2 and so on.
The seed file is given below:
Package: BSgenome.Amellifera.UCSC.apiMe1
Title: Full genome sequence for Apis mellifera (for demo purpose)
Description: Full genome sequences for Apis mellifera (Honey bee) as provided by UCSC (for demo purpose) and stored in Biostrings objects.
Version: 1.0
organism: Apis mellifera
common_name: Bee
provider: UCSC
provider_version: apiMe1
release_date: Nov. 2016
release_name: UCSC apiMe_1.0
source_url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/195/GCF_000002195.4_Amel_4.5/GCF_000002195.4_Amel_4.5_genomic.fna.gz
organism_biocview: Apis_mellifera
BSgenomeObjname: Amellifera
seqnames: paste("chr",c(1:16), sep="")
circ_seqs: "chr6"
SrcDataFiles: coming from UCSC for demo purpose
PkgExamples: genome$chr1 # same as genome[["chr1"]]
seqs_srcdir: /home/bioinfo11.corp/Desktop/Important/Vijay_Lakhujani/New_explorations/MEDIPS/dataset_by_sir/genome_files
I ran forgeBSgenomeDataPkg("path_to_my_seed_file") and it worked creating a folder in the same directory. It contains following folders:
DESCRIPTION inst man NAMESPACE R
Now, when I run the command available.genomes(), I can't see my custom genome.
Is it expected or there is something wrong?
PS: I referred to below pdf and page#3 says , we can use a collection of compressed FASTA files (chrI.fa.gz,chrII.fa.gz,chrIII.fa.gz, ...,chrXXI.fa.gz,chrM.fa.gz andchrUn.fa.gz).
Interestingly, when I gzipped the files, it did not work and when I used the unzipped files, I was able to forge the genome. Could that be an issue?
Forging genome in MEDIPS tutorial :https://www.bioconductor.org/packages/devel/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf
