RE : Question about translate funciton in Biostrings package
0
0
Entering edit mode
li lilingdu ▴ 450
@li-lilingdu-1884
Last seen 5.9 years ago
Simon No?l <simon.noel.2 at="" ...=""> writes: > > Hi, > > There is my understanting of the situation > > In DNA, there are some time ambiguities in nucleic acide sequence. Because an aa may have many codon, > sometime swiching an A for a C for exemple won't do any big difference. That where ambiguities letters are > used. Each organism have a prefered codon for each aa, and that's helping to find mutation when an other > codon for the same aa is used. If you simply want an aa sequence, replacing the ambiguities letters by one of > the possible an won't do any difference. If it's for doing phylogenic analysis, there a difference. From > what I know from physogenic analysis and what that package do, i think that's not what is intended to be done here. > > A solution can be to replace manualy each ambiguities letters by one of his correspondian nucleic acide. > After that, the function will work well... But an other possibility is to simply add new parameter to it. > You say that there no universal convention for the ambiguities letters... But the user should know what is > the convention for his sequence. So if my understanding is correct, adding new parameters to specify wich > ambiguities letters may be find and by wich nucleic acide do the replacement should fix the function. > > Am I right? > > Simon No?l > CdeC > > ________________________________________ > De : bioconductor-bounces at ... > [bioconductor-bounces at ...] de la part de Pages, Herve [hpages at ...] > Date d'envoi : 18 mars 2011 01:57 > ? : ligang > Cc : bioconductor at ... > Objet : Re: [BioC] Question about translate funciton in Biostrings package > > Hi LiGang, > > It's not clear to me what translate() should do when the input > contains ambiguity letters. I can see that for some ambiguities > in the input, the output won't be affected. Like in your first > example, replacing M by either A or C produces the same ouput: > > > translate(DNAString("AACTGTCGACCC")) > 4-letter "AAString" instance > seq: NCRP > > translate(DNAString("AACTGTCGCCCC")) > 4-letter "AAString" instance > seq: NCRP > > So yes I could add support for this. > > Otherwise, in general, what to do? Should the output contain letters > representing ambiguous amino acids? The problem is that last time I > checked I was not able to find "official" ambiguity codes for amino > acids that would represent all possible ambiguities in the protein > sequence resulting from all possible ambiguities in the DNA sequence. > > Can you please clarify what your question is? > > Thanks, > H. > > ----- Original Message ----- > From: "ligang" <luzifer.li at="" ...=""> > To: bioconductor at ... > Sent: Thursday, March 17, 2011 10:23:15 PM > Subject: [BioC] Question about translate funciton in Biostrings package > > Dear list, > > I'm using "tanslate" function in "Biostrings" package to translate DNA sequence > in proteins. > > It did well when the base letter is "A/G/C/T" > > But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M", > "tanslate" function did not work, for example: > > translate(DNAString("AACTGTCGMCCC")) > #Error in translate(DNAStringSet(x)) : not a base at pos 9 > > translate(DNAString("AACTGNTCG")) > #Error in translate(DNAStringSet(x)) : not a base at pos 6 > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=Chinese_People's Republic of China.936 > LC_CTYPE=Chinese_People's > Republic of China.936 LC_MONETARY=Chinese_People's Republic of China.936 > [4] LC_NUMERIC=C LC_TIME=Chinese_People's > Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.18.2 IRanges_1.8.9 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 tools_2.12.1 > > --- > LiGang > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > For some tools such as the translate tool at 'http://expasy.org/tools/dna.html', for DNAString "TTN", expasy tool return "X". and my question is >translate(DNAString("TTN")) could it return "X"? in Biostrings package, "X" is an accptable letter of AAString, for example: AAString("XXXARN") of course, It would be better if the 'translate' function can be more flexible, for example translate(DNAString("TCN")) ##because "TCA","TCG","TCC","TCT"all translate to 'Ser'?could above command return "S"? translate(DNAString("TTY")) ###because both "TCC" and "TCT" translate to 'Phe', could the above command return "F"? --- LiGang
Organism Biostrings Organism Biostrings • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6