Bsgenome gap mask conundrum
0
0
Entering edit mode
Ugo Borello ▴ 340
@ugo-borello-5753
Last seen 5.8 years ago
France
Hi everybody, In forging my custom Bsgenome data package I encountered a problem with gap masks. For my genome of interest, NCBI has 2 different gap masks for each assembled chromosome: the chr?.comp.agp file (chromosome from component AGP) and the chr?.agp file (chromosome from scaffold AGP). There is only one agp file for the unlocalized and one for the unplaced scaffold sequences. So when I forge my Bsgenome package using only the assembled chromosomes, everything goes very well. In this case I set the nmask_per_seq field in the seed file to 3: 2 agp masks (comp.agp and .agp files) and 1 repeatmasker mask for each assembled chromosome. Same positive result when I forge my Bsgenome package using the assembled chromosomes, the unlocalized, and the unplaced scaffold sequences and I set the nmask_per_seq field in the seed file to 2 (because I include in the package 1 agp mask (the .agp file) and 1 repeatmasker mask for all the fasta files). If you are still with me after this boring "maskerade", you can easily anticipate that forgeBSgenomeDataPkg() throws me an error when I try to use 3 masks for the assembled chromosomes and 2 for the rest of the sequences. In this case I set the nmask_per_seq field in the seed file to 3. My questions are: - is there a way to use to use 3 masks for the assembled chromosomes and 2 for the rest of the sequences? In this case what is the value of the nmask_per_seq field in the seed file? - shall I simply ignore the comp.agp files? Are they useful for the assembled chromosomes? Thank you very much for your help Ugo
BSgenome BSgenome BSgenome BSgenome • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6