Entering edit mode
Hi everybody,
In forging my custom Bsgenome data package I encountered a problem
with gap
masks.
For my genome of interest, NCBI has 2 different gap masks for each
assembled
chromosome: the chr?.comp.agp file (chromosome from component AGP)
and the
chr?.agp file (chromosome from scaffold AGP). There is only one agp
file for
the unlocalized and one for the unplaced scaffold sequences.
So when I forge my Bsgenome package using only the assembled
chromosomes,
everything goes very well.
In this case I set the nmask_per_seq field in the seed file to 3: 2
agp
masks (comp.agp and .agp files) and 1 repeatmasker mask for each
assembled
chromosome.
Same positive result when I forge my Bsgenome package using the
assembled
chromosomes, the unlocalized, and the unplaced scaffold sequences and
I set
the nmask_per_seq field in the seed file to 2 (because I include in
the
package 1 agp mask (the .agp file) and 1 repeatmasker mask for all the
fasta
files).
If you are still with me after this boring "maskerade", you can easily
anticipate that forgeBSgenomeDataPkg() throws me an error when I try
to use
3 masks for the assembled chromosomes and 2 for the rest of the
sequences.
In this case I set the nmask_per_seq field in the seed file to 3.
My questions are:
- is there a way to use to use 3 masks for the assembled chromosomes
and 2
for the rest of the sequences? In this case what is the value of the
nmask_per_seq field in the seed file?
- shall I simply ignore the comp.agp files? Are they useful for the
assembled chromosomes?
Thank you very much for your help
Ugo