Question

Bsgenome gap mask conundrum

0

Entering edit mode

Ugo Borello ▴ 340

@ugo-borello-5753

Last seen 6.8 years ago

France

Hi everybody, In forging my custom Bsgenome data package I encountered a problem with gap masks. For my genome of interest, NCBI has 2 different gap masks for each assembled chromosome: the chr?.comp.agp file (chromosome from component AGP) and the chr?.agp file (chromosome from scaffold AGP). There is only one agp file for the unlocalized and one for the unplaced scaffold sequences. So when I forge my Bsgenome package using only the assembled chromosomes, everything goes very well. In this case I set the nmask_per_seq field in the seed file to 3: 2 agp masks (comp.agp and .agp files) and 1 repeatmasker mask for each assembled chromosome. Same positive result when I forge my Bsgenome package using the assembled chromosomes, the unlocalized, and the unplaced scaffold sequences and I set the nmask_per_seq field in the seed file to 2 (because I include in the package 1 agp mask (the .agp file) and 1 repeatmasker mask for all the fasta files). If you are still with me after this boring "maskerade", you can easily anticipate that forgeBSgenomeDataPkg() throws me an error when I try to use 3 masks for the assembled chromosomes and 2 for the rest of the sequences. In this case I set the nmask_per_seq field in the seed file to 3. My questions are: - is there a way to use to use 3 masks for the assembled chromosomes and 2 for the rest of the sequences? In this case what is the value of the nmask_per_seq field in the seed file? - shall I simply ignore the comp.agp files? Are they useful for the assembled chromosomes? Thank you very much for your help Ugo

BSgenome BSgenome BSgenome BSgenome • 1.8k views

ADD COMMENT • link 11.7 years ago Ugo Borello ▴ 340