(BSgenome) forgeBSgenomeDataPkg for Sus scrofa problem
1
0
Entering edit mode
@elisabetta-manduchi-575
Last seen 7.1 years ago
Hello, I'm trying to build a data package for Sus scrofa with BSgenome (R version 2.13.2 and BSgenome version 1.20.0). At the bottom of this email I've copied my seed file. I've downloaded the sequence files from UCSC and checked the md5sums. I've also downloaded the gap.txt and masks files (chr*.fa.out and chr*.bed) from UCSC (but no md5sums were provided). I've followed the instructions from http://bioconductor.org/packages/2.8/bioc/vignettes/BSgenome/inst/doc/ BSgenomeForge.pdf and I'm getting the following error --- > forgeBSgenomeDataPkg("./BSgenome.Sscrofa.UCSC.susScr2-seed") Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, masks_srcdir = masks_srcdir, : values for symbols NMASKPERSEQ are not single strings --- Can you advice on what the problem might be? Thanks, Elisabetta *SEED file BSgenome.Sscrofa.UCSC.susScr2-seed* Package: BSgenome.Sscrofa.UCSC.susScr2 Title: Sus scrofa (Pig) full genome (UCSC version susScr2) Description: Sus scrofa (Pig) full genome as provided by UCSC (susScr2, Nov. 2009) Version: 0.1-0 Author: Elisabetta Manduchi <manduchi at="" pcbi.upenn.edu=""> Maintainer: Elisabetta Manduchi <manduchi at="" pcbi.upenn.edu=""> License: GPL-3 organism: Sus scrofa species: Pig provider: UCSC provider_version: susScr2 release_date: Nov. 2009 release_name: SGSC Sscrofa9.2 source_url: http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/ organism_biocview: Sus_scrofa BSgenomeObjname: Sscrofa seqnames: paste("chr", c(1:18, "X", "M"), sep="") circ_seqs: "chrM" SrcDataFiles1: sequences: all the chr*.fa.gz files from ftp://hgdownload.cse.ucsc.edu/goldenPath/susScr2/chromosomes/ SrcDataFiles2: AGAPS masks: the gap.txt.gz file from http://hgdownload.cse.ucsc.edu/golden Path/susScr2/database/; RM masks: http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZip s/chromOut.tar.gz;TRF masks: http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/chr omTrf.tar.gz seqs_srcdir: /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/wo rking_dir/MEDIPS/BSgenome.Sscrofa.UCSC.susScr2/seqs masks_srcdir: /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/w orking_dir/MEDIPS/BS genome.Sscrofa.UCSC.susScr2/masks AGAPSfiles_type: gap AGAPSfiles_name: gap.txt
Sus scrofa BSgenome BSgenome Sus scrofa BSgenome BSgenome • 1.1k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States
Hi Elisabetta, Handling of missing nmask_per_seq field was broken (should have been set to 0 when missing). I just fixed this in BSgenome release (1.20.1) and devel (1.21.7). Anyway, in your case, it seems like you *do* have masks, so you need to have the nmask_per_seq field explicitly set to a non-zero value in your seed file. For example, if you have the 4 "standard" masks: nmask_per_seq: 4 You can look at the seed file for hg19 in the BSgenome package (BSgenome/inst/extdata/GentlemanLab/BSgenome.Hsapiens.UCSC.hg19-seed) for an example. Please let me know if you have further questions about this. Cheers, H. On 11-10-07 11:21 AM, Elisabetta Manduchi wrote: > > Hello, > I'm trying to build a data package for Sus scrofa with BSgenome (R > version 2.13.2 and BSgenome version 1.20.0). > At the bottom of this email I've copied my seed file. > I've downloaded the sequence files from UCSC and checked the md5sums. > I've also downloaded the gap.txt and masks files (chr*.fa.out and > chr*.bed) from UCSC (but no md5sums were provided). > I've followed the instructions from > http://bioconductor.org/packages/2.8/bioc/vignettes/BSgenome/inst/do c/BSgenomeForge.pdf > > and I'm getting the following error > > --- >> forgeBSgenomeDataPkg("./BSgenome.Sscrofa.UCSC.susScr2-seed") > Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, masks_srcdir > = masks_srcdir, : > values for symbols NMASKPERSEQ are not single strings > --- > > Can you advice on what the problem might be? > Thanks, > Elisabetta > > > *SEED file BSgenome.Sscrofa.UCSC.susScr2-seed* > > Package: BSgenome.Sscrofa.UCSC.susScr2 > Title: Sus scrofa (Pig) full genome (UCSC version susScr2) > Description: Sus scrofa (Pig) full genome as provided by UCSC (susScr2, > Nov. 2009) > Version: 0.1-0 > Author: Elisabetta Manduchi <manduchi at="" pcbi.upenn.edu=""> > Maintainer: Elisabetta Manduchi <manduchi at="" pcbi.upenn.edu=""> > License: GPL-3 > organism: Sus scrofa > species: Pig > provider: UCSC > provider_version: susScr2 > release_date: Nov. 2009 > release_name: SGSC Sscrofa9.2 > source_url: http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/ > organism_biocview: Sus_scrofa > BSgenomeObjname: Sscrofa > seqnames: paste("chr", c(1:18, "X", "M"), sep="") > circ_seqs: "chrM" > SrcDataFiles1: sequences: all the chr*.fa.gz files from > ftp://hgdownload.cse.ucsc.edu/goldenPath/susScr2/chromosomes/ > SrcDataFiles2: AGAPS masks: the gap.txt.gz file from > http://hgdownload.cse.ucsc.edu/golden > Path/susScr2/database/; RM masks: > http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZip > s/chromOut.tar.gz;TRF masks: > http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/chr > omTrf.tar.gz > seqs_srcdir: > /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/working_dir/M EDIPS/BSgenome.Sscrofa.UCSC.susScr2/seqs > > masks_srcdir: > /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/working_dir/M EDIPS/BS > > genome.Sscrofa.UCSC.susScr2/masks > AGAPSfiles_type: gap > AGAPSfiles_name: gap.txt > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Thanks for the response. I guess I hadn't fully understood this seed file field. I've now set it to 4 (since I have gap.txt, plus chromosomal RM, TRF files and no since files are needed for AMB masks) and the function is running now. Elisabetta --- On Mon, 10 Oct 2011, Hervé Pagès wrote: > Hi Elisabetta, > > Handling of missing nmask_per_seq field was broken (should have been > set to 0 when missing). I just fixed this in BSgenome release (1.20.1) > and devel (1.21.7). Anyway, in your case, it seems like you *do* have > masks, so you need to have the nmask_per_seq field explicitly set > to a non-zero value in your seed file. For example, if you have the 4 > "standard" masks: > > nmask_per_seq: 4 > > You can look at the seed file for hg19 in the BSgenome package > (BSgenome/inst/extdata/GentlemanLab/BSgenome.Hsapiens.UCSC.hg19-seed) > for an example. > > Please let me know if you have further questions about this. > > Cheers, > H. > > > On 11-10-07 11:21 AM, Elisabetta Manduchi wrote: >> >> Hello, >> I'm trying to build a data package for Sus scrofa with BSgenome (R >> version 2.13.2 and BSgenome version 1.20.0). >> At the bottom of this email I've copied my seed file. >> I've downloaded the sequence files from UCSC and checked the md5sums. >> I've also downloaded the gap.txt and masks files (chr*.fa.out and >> chr*.bed) from UCSC (but no md5sums were provided). >> I've followed the instructions from >> http://bioconductor.org/packages/2.8/bioc/vignettes/BSgenome/inst/d oc/BSgenomeForge.pdf >> >> and I'm getting the following error >> >> --- >>> forgeBSgenomeDataPkg("./BSgenome.Sscrofa.UCSC.susScr2-seed") >> Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, masks_srcdir >> = masks_srcdir, : >> values for symbols NMASKPERSEQ are not single strings >> --- >> >> Can you advice on what the problem might be? >> Thanks, >> Elisabetta >> >> >> *SEED file BSgenome.Sscrofa.UCSC.susScr2-seed* >> >> Package: BSgenome.Sscrofa.UCSC.susScr2 >> Title: Sus scrofa (Pig) full genome (UCSC version susScr2) >> Description: Sus scrofa (Pig) full genome as provided by UCSC (susScr2, >> Nov. 2009) >> Version: 0.1-0 >> Author: Elisabetta Manduchi <manduchi at="" pcbi.upenn.edu=""> >> Maintainer: Elisabetta Manduchi <manduchi at="" pcbi.upenn.edu=""> >> License: GPL-3 >> organism: Sus scrofa >> species: Pig >> provider: UCSC >> provider_version: susScr2 >> release_date: Nov. 2009 >> release_name: SGSC Sscrofa9.2 >> source_url: http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/ >> organism_biocview: Sus_scrofa >> BSgenomeObjname: Sscrofa >> seqnames: paste("chr", c(1:18, "X", "M"), sep="") >> circ_seqs: "chrM" >> SrcDataFiles1: sequences: all the chr*.fa.gz files from >> ftp://hgdownload.cse.ucsc.edu/goldenPath/susScr2/chromosomes/ >> SrcDataFiles2: AGAPS masks: the gap.txt.gz file from >> http://hgdownload.cse.ucsc.edu/golden >> Path/susScr2/database/; RM masks: >> http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZip >> s/chromOut.tar.gz;TRF masks: >> http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/chr >> omTrf.tar.gz >> seqs_srcdir: >> /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/working_dir/ MEDIPS/BSgenome.Sscrofa.UCSC.susScr2/seqs >> >> masks_srcdir: >> /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/working_dir/ MEDIPS/BS >> >> genome.Sscrofa.UCSC.susScr2/masks >> AGAPSfiles_type: gap >> AGAPSfiles_name: gap.txt >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 >
ADD REPLY

Login before adding your answer.

Traffic: 471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6