Search
Question: fasta biostrings bioconductor
0
gravatar for Guest User
4.2 years ago by
Guest User12k
Guest User12k wrote:
I posted this same quandary on Biostars and stack overflow. I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: >GeneNameOne CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >GeneNameTwo CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC ...etc I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. Does anyone have an idea on what is going on? Thanks in advance. -- output of sessionInfo(): Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 4.2 years ago by Hervé Pagès ♦♦ 13k • written 4.2 years ago by Guest User12k
1
gravatar for Hervé Pagès
4.2 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:
Hi there, I guess you're trying to use DNAStringSet() on a file name that contains a "p", which of course is not going to work (and even if it worked, it wouldn't do what you're trying to do). To read a FASTA file, use readDNAStringSet(), not the DNAStringSet constructor function. Cheers, H. On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > > I posted this same quandary on Biostars and stack overflow. > > I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >> GeneNameOne > CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> GeneNameTwo > CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC > ...etc > > I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. > > Does anyone have an idea on what is going on? > > Thanks in advance. > > -- output of sessionInfo(): > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENTlink written 4.2 years ago by Hervé Pagès ♦♦ 13k
0
gravatar for Martin Morgan
4.2 years ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:
On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > > I posted this same quandary on Biostars and stack overflow. > > I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >> GeneNameOne > CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> GeneNameTwo > CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC > ...etc > > I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. you could try a divide-and-conquer approach, splitting the file into two and read each and choose the half with a problem and continue. Please continue reading below... > > Does anyone have an idea on what is going on? > > Thanks in advance. > > -- output of sessionInfo(): > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table Rather than repeating the error without context, it is usually helpful to cut-and-paste the relevant portions of the session that causes problems, e.g., > library(Biostrings) > readLines("FileName.fa", 4) ## correct file? [1] "> GeneNameOne" [2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA" [3] "> GeneNameTwo" [4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC" > readDNAStringSet("FileName.fa") Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table The information being asked for here is the output of the command sessionInfo() so that basic information about your system is available; here's mine, > library(Biostrings) > sessionInfo() R version 3.0.2 Patched (2014-01-02 r64626) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] stats4_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENTlink written 4.2 years ago by Martin Morgan ♦♦ 21k
Hello guest with no name, Have you tried something simple? library(ShortRead) mysequences <- readFasta('FileName.fa') Cheers, Ivan Ivan Gregoretti, PhD Bioinformatics On Fri, Mar 28, 2014 at 12:56 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > >> >> I posted this same quandary on Biostars and stack overflow. >> >> I am attempting to import a fasta file of sequences into R using >> Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I >> keep getting the same error: >> >> Error in .Call2("new_XString_from_CHARACTER", classname, x, >> start(solved_SEW), : >> key 112 (char 'p') not in lookup table >> >> My fasta file ("FileName.fa") is comprised of various length sequences, >> in the following format: >> >> GeneNameOne >>> >> CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> >>> GeneNameTwo >>> >> CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC >> ...etc >> >> I performed 'grep p FileName.fa' in the Unix terminal, but I received no >> output. >> > > you could try a divide-and-conquer approach, splitting the file into two > and read each and choose the half with a problem and continue. Please > continue reading below... > > > >> Does anyone have an idea on what is going on? >> >> Thanks in advance. >> >> -- output of sessionInfo(): >> >> Error in .Call2("new_XString_from_CHARACTER", classname, x, >> start(solved_SEW), : >> key 112 (char 'p') not in lookup table >> > > Rather than repeating the error without context, it is usually helpful to > cut-and-paste the relevant portions of the session that causes problems, > e.g., > > > library(Biostrings) > > readLines("FileName.fa", 4) ## correct file? > [1] "> GeneNameOne" > [2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA" > [3] "> GeneNameTwo" > [4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC" > > readDNAStringSet("FileName.fa") > > Error in .Call2("new_XString_from_CHARACTER", classname, x, > start(solved_SEW), : key 112 (char 'p') not in lookup table > > The information being asked for here is the output of the command > sessionInfo() so that basic information about your system is available; > here's mine, > > > library(Biostrings) > > sessionInfo() > R version 3.0.2 Patched (2014-01-02 r64626) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 > BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] stats4_3.0.2 > > > > >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 4.2 years ago by Ivan Gregoretti310
0
gravatar for Malcolm Cook
4.2 years ago by
Malcolm Cook1.4k
United States
Malcolm Cook1.4k wrote:
Hi Just a thought.... Did you run the grep with -i option for case insensitivity? If you should find a "P" then look again and see if you have any control-As in that file. If you do, then, I'm guessing that file came from NCBI. If it did, then, know this: NCBI uses control-A to separate multi- line deflines in fasta files. That's all I got, Malcolm Cook >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of DNAStringSet Error Biostrings >in R [guest] >Sent: Friday, March 28, 2014 11:43 AM >To: bioconductor at r-project.org; ttatanas at ucsd.edu >Subject: [BioC] fasta biostrings bioconductor > > >I posted this same quandary on Biostars and stack overflow. > >I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but >I keep getting the same error: > >Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : >key 112 (char 'p') not in lookup table > >My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >>GeneNameOne >CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >>GeneNameTwo >CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC >...etc > >I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. > >Does anyone have an idea on what is going on? > >Thanks in advance. > > -- output of sessionInfo(): > >Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : >key 112 (char 'p') not in lookup table > >-- >Sent via the guest posting facility at bioconductor.org. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 4.2 years ago by Malcolm Cook1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour