trouble reading DNA stringset from keggGet function
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
I am having some difficulty making fasta files out of files returned by the keggGet function in the KEGGREST package. The file returned is apparently a DNA string set, but readDNAStringSet will not process it. I've tried it with other data and with different kinds of sequences (amino acid) and received the same error message -- I'm sure I must be missing something. My R output is below. Thanks so much for any help! -- output of sessionInfo(): > genes<-keggLink("ath00906") > head(genes) [,1] [,2] [,3] [1,] "path:ath00906" "ath:AT1G06820" "reverse" [2,] "path:ath00906" "ath:AT1G08550" "reverse" [3,] "path:ath00906" "ath:AT1G10830" "reverse" [4,] "path:ath00906" "ath:AT1G30100" "reverse" [5,] "path:ath00906" "ath:AT1G31800" "reverse" [6,] "path:ath00906" "ath:AT1G52340" "reverse" > sequences<-keggGet(genes[1:10,2],"ntseq") > head(sequences) A DNAStringSet instance of length 6 width seq names [1] 1788 ATGGATTTGTGTTTTC...AGGACACTCGCATAG ath:AT1G06820 CRT... [2] 1389 ATGGCAGTAGCTACAC...AGGAAGGTCAGGTAG ath:AT1G08550 NPQ... [3] 858 ATGGCGGTTTATCATC...ATTGGATTTTTATGA ath:AT1G10830 Z-I... [4] 1770 ATGGCTTGTTCTTACA...TTAAACCAGGCTTAA ath:AT1G30100 NCE... [5] 1788 ATGGCTATGGCCTTTC...TCTGCTCTTTCTTAA ath:AT1G31800 CYP... [6] 858 ATGTCAACGAACACTG...AAAGTCTTCAGATGA ath:AT1G52340 ABA... > readDNAStringSet(sequences,"fasta") Error in .normargInputFilepath(filepath) : 'filepath' must be a character vector with no NAs > class(sequences) #confirm that the input is a DNA string set [1] "DNAStringSet" attr(,"package") [1] "Biostrings" -- Sent via the guest posting facility at bioconductor.org.
PROcess KEGGREST PROcess KEGGREST • 1.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 15 hours ago
United States
Hi Elliot, > library("KEGGREST")genes<-keggLink("ath00906") > genes<-keggLink("ath00906") > sequences<-keggGet(genes[1:10,2],"ntseq") > writeXStringSet(sequences, "./tmp.fasta") > scan("tmp.fasta", "c", nlines=2, sep = "\t") ## check it Read 2 items [1] ">ath:AT1G06820 CRTISO; carotenoid isomerase; K09835 prolycopene isomerase [EC:5.2.1.13] (N)" [2] "ATGGATTTGTGTTTTCAAAATCCCGTAAAGTGTGGTGATCGTTTGTTCTCCGCATTGAATACCTCTACG TATTACAAGCT" Best, Jim On Tuesday, September 10, 2013 1:55:20 PM, Elliot [guest] wrote: > > I am having some difficulty making fasta files out of files returned by the keggGet function in the KEGGREST package. The file returned is apparently a DNA string set, but readDNAStringSet will not process it. I've tried it with other data and with different kinds of sequences (amino acid) and received the same error message -- I'm sure I must be missing something. My R output is below. Thanks so much for any help! > > > > -- output of sessionInfo(): > >> genes<-keggLink("ath00906") > >> head(genes) > [,1] [,2] [,3] > [1,] "path:ath00906" "ath:AT1G06820" "reverse" > [2,] "path:ath00906" "ath:AT1G08550" "reverse" > [3,] "path:ath00906" "ath:AT1G10830" "reverse" > [4,] "path:ath00906" "ath:AT1G30100" "reverse" > [5,] "path:ath00906" "ath:AT1G31800" "reverse" > [6,] "path:ath00906" "ath:AT1G52340" "reverse" > >> sequences<-keggGet(genes[1:10,2],"ntseq") > >> head(sequences) > A DNAStringSet instance of length 6 > width seq names > [1] 1788 ATGGATTTGTGTTTTC...AGGACACTCGCATAG ath:AT1G06820 CRT... > [2] 1389 ATGGCAGTAGCTACAC...AGGAAGGTCAGGTAG ath:AT1G08550 NPQ... > [3] 858 ATGGCGGTTTATCATC...ATTGGATTTTTATGA ath:AT1G10830 Z-I... > [4] 1770 ATGGCTTGTTCTTACA...TTAAACCAGGCTTAA ath:AT1G30100 NCE... > [5] 1788 ATGGCTATGGCCTTTC...TCTGCTCTTTCTTAA ath:AT1G31800 CYP... > [6] 858 ATGTCAACGAACACTG...AAAGTCTTCAGATGA ath:AT1G52340 ABA... > >> readDNAStringSet(sequences,"fasta") > Error in .normargInputFilepath(filepath) : > 'filepath' must be a character vector with no NAs > >> class(sequences) #confirm that the input is a DNA string set > [1] "DNAStringSet" > attr(,"package") > [1] "Biostrings" > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6