fasta biostrings bioconductor

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I posted this same quandary on Biostars and stack overflow. I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: >GeneNameOne CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >GeneNameTwo CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC ...etc I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. Does anyone have an idea on what is going on? Thanks in advance. -- output of sessionInfo(): Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table -- Sent via the guest posting facility at bioconductor.org.

• 3.9k views

ADD COMMENT • link updated 10.1 years ago by Hervé Pagès 16k • written 10.1 years ago by Guest User ★ 13k

1

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 3 hours ago

Seattle, WA, United States

Hi there, I guess you're trying to use DNAStringSet() on a file name that contains a "p", which of course is not going to work (and even if it worked, it wouldn't do what you're trying to do). To read a FASTA file, use readDNAStringSet(), not the DNAStringSet constructor function. Cheers, H. On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > > I posted this same quandary on Biostars and stack overflow. > > I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >> GeneNameOne > CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> GeneNameTwo > CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC > ...etc > > I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. > > Does anyone have an idea on what is going on? > > Thanks in advance. > > -- output of sessionInfo(): > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 10.1 years ago Hervé Pagès 16k

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 2 hours ago

United States

On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > > I posted this same quandary on Biostars and stack overflow. > > I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error: > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table > > My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >> GeneNameOne > CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> GeneNameTwo > CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC > ...etc > > I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. you could try a divide-and-conquer approach, splitting the file into two and read each and choose the half with a problem and continue. Please continue reading below... > > Does anyone have an idea on what is going on? > > Thanks in advance. > > -- output of sessionInfo(): > > Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : > key 112 (char 'p') not in lookup table Rather than repeating the error without context, it is usually helpful to cut-and-paste the relevant portions of the session that causes problems, e.g., > library(Biostrings) > readLines("FileName.fa", 4) ## correct file? [1] "> GeneNameOne" [2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA" [3] "> GeneNameTwo" [4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC" > readDNAStringSet("FileName.fa") Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 112 (char 'p') not in lookup table The information being asked for here is the output of the command sessionInfo() so that basic information about your system is available; here's mine, > library(Biostrings) > sessionInfo() R version 3.0.2 Patched (2014-01-02 r64626) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] stats4_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 10.1 years ago Martin Morgan 25k

0

Entering edit mode

Hello guest with no name, Have you tried something simple? library(ShortRead) mysequences <- readFasta('FileName.fa') Cheers, Ivan Ivan Gregoretti, PhD Bioinformatics On Fri, Mar 28, 2014 at 12:56 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote: > >> >> I posted this same quandary on Biostars and stack overflow. >> >> I am attempting to import a fasta file of sequences into R using >> Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I >> keep getting the same error: >> >> Error in .Call2("new_XString_from_CHARACTER", classname, x, >> start(solved_SEW), : >> key 112 (char 'p') not in lookup table >> >> My fasta file ("FileName.fa") is comprised of various length sequences, >> in the following format: >> >> GeneNameOne >>> >> CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >> >>> GeneNameTwo >>> >> CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC >> ...etc >> >> I performed 'grep p FileName.fa' in the Unix terminal, but I received no >> output. >> > > you could try a divide-and-conquer approach, splitting the file into two > and read each and choose the half with a problem and continue. Please > continue reading below... > > > >> Does anyone have an idea on what is going on? >> >> Thanks in advance. >> >> -- output of sessionInfo(): >> >> Error in .Call2("new_XString_from_CHARACTER", classname, x, >> start(solved_SEW), : >> key 112 (char 'p') not in lookup table >> > > Rather than repeating the error without context, it is usually helpful to > cut-and-paste the relevant portions of the session that causes problems, > e.g., > > > library(Biostrings) > > readLines("FileName.fa", 4) ## correct file? > [1] "> GeneNameOne" > [2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA" > [3] "> GeneNameTwo" > [4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC" > > readDNAStringSet("FileName.fa") > > Error in .Call2("new_XString_from_CHARACTER", classname, x, > start(solved_SEW), : key 112 (char 'p') not in lookup table > > The information being asked for here is the output of the command > sessionInfo() so that basic information about your system is available; > here's mine, > > > library(Biostrings) > > sessionInfo() > R version 3.0.2 Patched (2014-01-02 r64626) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 > BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] stats4_3.0.2 > > > > >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Ivan Gregoretti ▴ 310

0

Entering edit mode

Malcolm Cook ★ 1.6k

@malcolm-cook-6293

Last seen 1 day ago

United States

Hi Just a thought.... Did you run the grep with -i option for case insensitivity? If you should find a "P" then look again and see if you have any control-As in that file. If you do, then, I'm guessing that file came from NCBI. If it did, then, know this: NCBI uses control-A to separate multi- line deflines in fasta files. That's all I got, Malcolm Cook >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of DNAStringSet Error Biostrings >in R [guest] >Sent: Friday, March 28, 2014 11:43 AM >To: bioconductor at r-project.org; ttatanas at ucsd.edu >Subject: [BioC] fasta biostrings bioconductor > > >I posted this same quandary on Biostars and stack overflow. > >I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but >I keep getting the same error: > >Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : >key 112 (char 'p') not in lookup table > >My fasta file ("FileName.fa") is comprised of various length sequences, in the following format: > >>GeneNameOne >CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA >>GeneNameTwo >CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC >...etc > >I performed 'grep p FileName.fa' in the Unix terminal, but I received no output. > >Does anyone have an idea on what is going on? > >Thanks in advance. > > -- output of sessionInfo(): > >Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : >key 112 (char 'p') not in lookup table > >-- >Sent via the guest posting facility at bioconductor.org. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.1 years ago Malcolm Cook ★ 1.6k

Login before adding your answer.