Question

How to make my custom genome available in MEDIPS?

0

Entering edit mode

Vijay Lakhujani • 0

@vijay-lakhujani-11556

Last seen 7.3 years ago

India

I am working with MEDIPS bioconductor. I was able to successfully create a custom BEE genome (I know its already there in MEDIPS, but for some reason I wanted to make a different one myself).

The genome folder consists of 16 separate fasta files for each chromosomes named like chr1, chr2 and so on.

The seed file is given below:

Package: BSgenome.Amellifera.UCSC.apiMe1
Title: Full genome sequence for Apis mellifera (for demo purpose)
Description: Full genome sequences for Apis mellifera (Honey bee) as provided by UCSC (for demo purpose) and stored in Biostrings objects.
Version: 1.0
organism: Apis mellifera
common_name: Bee
provider: UCSC
provider_version: apiMe1
release_date: Nov. 2016
release_name: UCSC apiMe_1.0
source_url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/195/GCF_000002195.4_Amel_4.5/GCF_000002195.4_Amel_4.5_genomic.fna.gz
organism_biocview: Apis_mellifera
BSgenomeObjname: Amellifera
seqnames: paste("chr",c(1:16), sep="")
circ_seqs: "chr6"
SrcDataFiles: coming from UCSC for demo purpose
PkgExamples: genome$chr1  # same as genome[["chr1"]]
seqs_srcdir: /home/bioinfo11.corp/Desktop/Important/Vijay_Lakhujani/New_explorations/MEDIPS/dataset_by_sir/genome_files

I ran forgeBSgenomeDataPkg("path_to_my_seed_file") and it worked creating a folder in the same directory. It contains following folders:

DESCRIPTION
inst
man
NAMESPACE
R

Now, when I run the command available.genomes(), I can't see my custom genome.

Is it expected or there is something wrong?

PS: I referred to below pdf and page#3 says , we can use a collection of compressed FASTA files (chrI.fa.gz,chrII.fa.gz,chrIII.fa.gz, ...,chrXXI.fa.gz,chrM.fa.gz andchrUn.fa.gz).

Interestingly, when I gzipped the files, it did not work and when I used the unzipped files, I was able to forge the genome. Could that be an issue?

Forging genome in MEDIPS tutorial :https://www.bioconductor.org/packages/devel/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf

MEDIPS • 1.5k views

ADD COMMENT • link 9.2 years ago Vijay Lakhujani • 0

score 0 · Answer 1 · 2016-11-09

Answering my own question:

1. It looks like it is expected that the forged custom genome will not be listed when you run the command available.genomes() in R. This is because it happened in my case and despite of this, I was able to perform the downstream analysis without any issue. To ensure that the custom genome is available inside MEDIPS, you can load the genome package; please see the below example:

library(BSgenome.Amellifera.UCSC.apiMe1)

If it does not throw any error, it means you have successfully loaded the genome and it is ready to use.

2. Check the extension of the fasta files that you have used in your genome. Edit the seed file as follows:

If it is ".fa.gz" , add the seqfiles field in the seed file as follows

seqfiles_suffix: .fa.gz

If it is ".fa" , add the seqfiles field in the seed file as follows

seqfiles_suffix: .fa