Search
Question: Question about translate funciton in Biostrings package
0
gravatar for li lilingdu
6.7 years ago by
li lilingdu450
li lilingdu450 wrote:
Dear list, I'm using "tanslate" function in "Biostrings" package to translate DNA sequence in proteins. It did well when the base letter is "A/G/C/T" But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M", "tanslate" function did not work, for example: translate(DNAString("AACTGTCGMCCC")) #Error in translate(DNAStringSet(x)) : not a base at pos 9 translate(DNAString("AACTGNTCG")) #Error in translate(DNAStringSet(x)) : not a base at pos 6 sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Chinese_People's Republic of China.936 LC_CTYPE=Chinese_People's Republic of China.936 LC_MONETARY=Chinese_People's Republic of China.936 [4] LC_NUMERIC=C LC_TIME=Chinese_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.18.2 IRanges_1.8.9 loaded via a namespace (and not attached): [1] Biobase_2.10.0 tools_2.12.1 --- LiGang
ADD COMMENTlink modified 6.7 years ago by Hervé Pagès ♦♦ 13k • written 6.7 years ago by li lilingdu450
0
gravatar for Hervé Pagès
6.7 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:
Hi LiGang, It's not clear to me what translate() should do when the input contains ambiguity letters. I can see that for some ambiguities in the input, the output won't be affected. Like in your first example, replacing M by either A or C produces the same ouput: > translate(DNAString("AACTGTCGACCC")) 4-letter "AAString" instance seq: NCRP > translate(DNAString("AACTGTCGCCCC")) 4-letter "AAString" instance seq: NCRP So yes I could add support for this. Otherwise, in general, what to do? Should the output contain letters representing ambiguous amino acids? The problem is that last time I checked I was not able to find "official" ambiguity codes for amino acids that would represent all possible ambiguities in the protein sequence resulting from all possible ambiguities in the DNA sequence. Can you please clarify what your question is? Thanks, H. ----- Original Message ----- From: "ligang" <luzifer.li@gmail.com> To: bioconductor at stat.math.ethz.ch Sent: Thursday, March 17, 2011 10:23:15 PM Subject: [BioC] Question about translate funciton in Biostrings package Dear list, I'm using "tanslate" function in "Biostrings" package to translate DNA sequence in proteins. It did well when the base letter is "A/G/C/T" But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M", "tanslate" function did not work, for example: translate(DNAString("AACTGTCGMCCC")) #Error in translate(DNAStringSet(x)) : not a base at pos 9 translate(DNAString("AACTGNTCG")) #Error in translate(DNAStringSet(x)) : not a base at pos 6 sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Chinese_People's Republic of China.936 LC_CTYPE=Chinese_People's Republic of China.936 LC_MONETARY=Chinese_People's Republic of China.936 [4] LC_NUMERIC=C LC_TIME=Chinese_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.18.2 IRanges_1.8.9 loaded via a namespace (and not attached): [1] Biobase_2.10.0 tools_2.12.1 --- LiGang _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.7 years ago by Hervé Pagès ♦♦ 13k
Hi, There is my understanting of the situation In DNA, there are some time ambiguities in nucleic acide sequence. Because an aa may have many codon, sometime swiching an A for a C for exemple won't do any big difference. That where ambiguities letters are used. Each organism have a prefered codon for each aa, and that's helping to find mutation when an other codon for the same aa is used. If you simply want an aa sequence, replacing the ambiguities letters by one of the possible an won't do any difference. If it's for doing phylogenic analysis, there a difference. From what I know from physogenic analysis and what that package do, i think that's not what is intended to be done here. A solution can be to replace manualy each ambiguities letters by one of his correspondian nucleic acide. After that, the function will work well... But an other possibility is to simply add new parameter to it. You say that there no universal convention for the ambiguities letters... But the user should know what is the convention for his sequence. So if my understanding is correct, adding new parameters to specify wich ambiguities letters may be find and by wich nucleic acide do the replacement should fix the function. Am I right? Simon No?l CdeC ________________________________________ De : bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] de la part de Pages, Herve [hpages at fhcrc.org] Date d'envoi : 18 mars 2011 01:57 ? : ligang Cc : bioconductor at stat.math.ethz.ch Objet : Re: [BioC] Question about translate funciton in Biostrings package Hi LiGang, It's not clear to me what translate() should do when the input contains ambiguity letters. I can see that for some ambiguities in the input, the output won't be affected. Like in your first example, replacing M by either A or C produces the same ouput: > translate(DNAString("AACTGTCGACCC")) 4-letter "AAString" instance seq: NCRP > translate(DNAString("AACTGTCGCCCC")) 4-letter "AAString" instance seq: NCRP So yes I could add support for this. Otherwise, in general, what to do? Should the output contain letters representing ambiguous amino acids? The problem is that last time I checked I was not able to find "official" ambiguity codes for amino acids that would represent all possible ambiguities in the protein sequence resulting from all possible ambiguities in the DNA sequence. Can you please clarify what your question is? Thanks, H. ----- Original Message ----- From: "ligang" <luzifer.li@gmail.com> To: bioconductor at stat.math.ethz.ch Sent: Thursday, March 17, 2011 10:23:15 PM Subject: [BioC] Question about translate funciton in Biostrings package Dear list, I'm using "tanslate" function in "Biostrings" package to translate DNA sequence in proteins. It did well when the base letter is "A/G/C/T" But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M", "tanslate" function did not work, for example: translate(DNAString("AACTGTCGMCCC")) #Error in translate(DNAStringSet(x)) : not a base at pos 9 translate(DNAString("AACTGNTCG")) #Error in translate(DNAStringSet(x)) : not a base at pos 6 sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Chinese_People's Republic of China.936 LC_CTYPE=Chinese_People's Republic of China.936 LC_MONETARY=Chinese_People's Republic of China.936 [4] LC_NUMERIC=C LC_TIME=Chinese_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.18.2 IRanges_1.8.9 loaded via a namespace (and not attached): [1] Biobase_2.10.0 tools_2.12.1 --- LiGang _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 6.7 years ago by SimonNoël450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 258 users visited in the last hour