forgeSeqFiles
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hello everyone, I am so new to R as well as Bioconductor but found them very helpful so I'm trying to use. Now I need to make a BSgenome pckg for my own organism. Therefore I have made a folder named seqs_srcdir which contains 17,000 files (one gene_sequence per file), but when I downloaded, and even unzipped BSgenome I kept getting the Error: could not find function "forgeSeqFiles". I know I need to make the seed files so I gave the command line: forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE) Error: could not find function "forgeSeqFiles". I would appreciate it if you could please advice, Melo -- output of sessionInfo(): forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE) Error: could not find function "forgeSeqFiles". -- Sent via the guest posting facility at bioconductor.org.
Organism BSgenome BSgenome Organism BSgenome BSgenome • 909 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 3 hours ago
Seattle, WA, United States
Hi Melo, On 12/22/2013 11:47 PM, Melo [guest] wrote: > > Hello everyone, > > I am so new to R as well as Bioconductor but found them very helpful so I'm trying to use. > Now I need to make a BSgenome pckg for my own organism. Therefore I have made a folder named seqs_srcdir which contains 17,000 files (one gene_sequence per file), Trying to forge a BSgenome package from the gene sequences is a bad idea. A lot of tools won't operate properly on this. A BSgenome data package is intended to represent the full genome of a given organism. The sequences in such a package are chromosomes and/or scaffolds and/or whatever sequences that are considered to constitute the genome assembly of the organism. A lot of tools that operate on BSgenome objects assume that. For example, it's easy to extract the gene sequences from a BSgenome object if you know the gene coordinates with respect to the assembly. The gene/transcripts/exons/cds coordinates are often stored in a GFF file or similar and can be imported in BioC with tools like makeTranscriptDbFromGFF() followed by a call to genes(), transcripts(), exons(), etc... which will return you the coordinates in a GRanges or GRangesList object. Then use getSeq() on the BSgenome and GRanges objects to extract the sequences as a DNAStringSet object. See ?makeTranscriptDbFromGFF and ?transcripts in the GenomicFeatures package for the details. But if you have 17,000 files, one gene sequence per file, you could directly load them in a DNAStringSet object by calling readDNAStringSet() on the character vector containing the 17,000 file paths. You can (and should) completely bypass the BSgenome data package in that case. > but when I downloaded, and even unzipped BSgenome This is not the recommended way to install a BioC package. Please always use biocLite() for that. See: http://bioconductor.org/install/ > I kept getting the Error: could not find function "forgeSeqFiles". > I know I need to make the seed files so I gave the command line: > > forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE) > > Error: could not find function "forgeSeqFiles". You need to call forgeBSgenomeDataPkg() to forge a BSgenome data package, not forgeSeqlengthsFile(). Please make sure you follow the instructions in the BSgenomeForge vignette where all the process of forging a BSgenome data package is explained. > > I would appreciate it if you could please advice, > Melo > > -- output of sessionInfo(): > > forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE) > > Error: could not find function "forgeSeqFiles". This doesn't look like the output of sessionInfo(). The output of sessionInfo() is... well... the output you get when you run the sessionInfo() command. Please always provide this information. Thanks! H. > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6