How to make my custom genome available in MEDIPS?
1
0
Entering edit mode
@vijay-lakhujani-11556
Last seen 3.0 years ago
India

I am working with MEDIPS bioconductor. I was able to successfully create a custom BEE genome (I know its already there in MEDIPS, but for some reason I wanted to make a different one myself).

The genome folder consists of 16 separate fasta files for each chromosomes named like chr1, chr2 and so on.

The seed file is given below:

Package: BSgenome.Amellifera.UCSC.apiMe1
Title: Full genome sequence for Apis mellifera (for demo purpose)
Description: Full genome sequences for Apis mellifera (Honey bee) as provided by UCSC (for demo purpose) and stored in Biostrings objects.
Version: 1.0
organism: Apis mellifera
common_name: Bee
provider: UCSC
provider_version: apiMe1
release_date: Nov. 2016
release_name: UCSC apiMe_1.0
source_url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/195/GCF_000002195.4_Amel_4.5/GCF_000002195.4_Amel_4.5_genomic.fna.gz
organism_biocview: Apis_mellifera
BSgenomeObjname: Amellifera
seqnames: paste("chr",c(1:16), sep="")
circ_seqs: "chr6"
SrcDataFiles: coming from UCSC for demo purpose
PkgExamples: genome$chr1  # same as genome[["chr1"]]
seqs_srcdir: /home/bioinfo11.corp/Desktop/Important/Vijay_Lakhujani/New_explorations/MEDIPS/dataset_by_sir/genome_files

I ran forgeBSgenomeDataPkg("path_to_my_seed_file") and it worked creating a folder in the same directory. It contains following folders: 

DESCRIPTION
inst
man
NAMESPACE
R

Now, when I run the command available.genomes(), I can't see my custom genome.

Is it expected or there is something wrong?

PS: I referred to below pdf and page#3 says , we can use a collection of compressed FASTA files (chrI.fa.gz,chrII.fa.gz,chrIII.fa.gz, ...,chrXXI.fa.gz,chrM.fa.gz andchrUn.fa.gz).

Interestingly, when I gzipped the files, it did not work and when I used the unzipped files, I was able to forge the genome. Could that be an issue?

Forging genome in MEDIPS tutorial :https://www.bioconductor.org/packages/devel/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf

 

MEDIPS • 642 views
ADD COMMENT
0
Entering edit mode
@vijay-lakhujani-11556
Last seen 3.0 years ago
India

Answering my own question:

1. It looks like it is expected that the forged custom genome will not be listed when you run the command available.genomes() in R. This is because it happened in my case and despite of this, I was able to perform the downstream analysis without any issue. To ensure that the custom genome is available inside MEDIPS, you can load the genome package; please see the below example:

library(BSgenome.Amellifera.UCSC.apiMe1)

If it does not throw any error, it means you have successfully loaded the genome and it is ready to use.

2. Check the extension of the fasta files that you have used in your genome. Edit the seed file as follows:

If it is ".fa.gz" , add the seqfiles  field in the seed file as follows

seqfiles_suffix: .fa.gz

If it is ".fa" , add the seqfiles  field in the seed file as follows

seqfiles_suffix: .fa
ADD COMMENT

Login before adding your answer.

Traffic: 363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6