Motivation and goal: To prepare a BSgenome data package for the model organism, Setaria italica. The reason is that one appears to be needed according to section 2.3 of ggbio vignette dated August 26, 2014 to add a reference track eventually to be used in an 'overview plot' (chpt 4 of the vignette).
Problem: I received the following error message while attempting to create a new BSgenome package following the vignette 'How to forge a BSgenome data package' dated Oct. 13, 2014. I previously upgraded to Bioconductor 3.0, though don't know how to verify was successful. Thanks for comments.
library(BSgenome)
> forgeBSgenomeDataPkg("/Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/BSgenome.Sitalica.Ensembl.22-seed")
Creating package in ./BSgenome.Sitalica.Ensembl.22
Error in getSeqSrcpaths(seqnames, prefix = prefix, suffix = suffix, seqs_srcdir = seqs_srcdir) :
/Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr1.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr2.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr3.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr4.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr5.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr6.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr7.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr8.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr9.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr1_random.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr2_random.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr3_random.fa, /Users/bterry/macbookpro2014/keenanres/Sitalica/packs/sitalica22/chr4_random.fa,
>
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biobase_2.26.0 BSgenome_1.34.0 rtracklayer_1.26.1 Biostrings_2.34.0
[5] XVector_0.6.0 GenomicRanges_1.18.1 GenomeInfoDb_1.2.0 IRanges_2.0.0
[9] S4Vectors_0.4.0 BiocGenerics_0.12.0
loaded via a namespace (and not attached):
[1] base64enc_0.1-2 BatchJobs_1.4 BBmisc_1.7
[4] BiocParallel_1.0.0 bitops_1.0-6 brew_1.0-6
[7] checkmate_1.5.0 codetools_0.2-9 DBI_0.3.1
[10] digest_0.6.4 fail_1.2 foreach_1.4.2
[13] GenomicAlignments_1.2.0 iterators_1.0.7 RCurl_1.95-4.3
[16] Rsamtools_1.18.0 RSQLite_0.11.4 sendmailR_1.2-1
[19] stringr_0.6.2 tools_3.1.1 XML_3.98-1.1
[22] zlibbioc_1.12.0
Hi Philip,
Are you sure you need the masks? Forging a BSgenome data package with masked sequences can be tricky. The good news is that most of the times it's not needed and using a BSgenome data package with bare sequences is enough. There are only very few use cases where using a BSgenome package with masks offers some (generally minor) advantage. So I would strongly recommend that you use the BSgenome package you forged (BSgenome.Sitalica.Ensembl.22) unless you have a good reason for forging and using BSgenome.Sitalica.Ensembl.22.masked.
Anyway the error you got says that
mask_per_seq
is not a valid field for your seed file. The correct field isnmask_per_seq
(as you can see in the BSgenomeForge vignette). Also I should probably clarify this in the vignette but you cannot have the RM masks without having the AGAPS and AMB masks. More precisely, if you want the RM masks, you need to have the following masks in that order: (1) AGAPS, (2) AMB, (3) RM. You'll also need to setnmask_per_seq
to 3.Let me know how it goes.
H.