RE : Question about translate funciton in Biostrings package

0

Entering edit mode

li lilingdu ▴ 450

@li-lilingdu-1884

Last seen 7.7 years ago

Simon No?l <simon.noel.2 at="" ...=""> writes: > > Hi, > > There is my understanting of the situation > > In DNA, there are some time ambiguities in nucleic acide sequence. Because an aa may have many codon, > sometime swiching an A for a C for exemple won't do any big difference. That where ambiguities letters are > used. Each organism have a prefered codon for each aa, and that's helping to find mutation when an other > codon for the same aa is used. If you simply want an aa sequence, replacing the ambiguities letters by one of > the possible an won't do any difference. If it's for doing phylogenic analysis, there a difference. From > what I know from physogenic analysis and what that package do, i think that's not what is intended to be done here. > > A solution can be to replace manualy each ambiguities letters by one of his correspondian nucleic acide. > After that, the function will work well... But an other possibility is to simply add new parameter to it. > You say that there no universal convention for the ambiguities letters... But the user should know what is > the convention for his sequence. So if my understanding is correct, adding new parameters to specify wich > ambiguities letters may be find and by wich nucleic acide do the replacement should fix the function. > > Am I right? > > Simon No?l > CdeC > > ________________________________________ > De : bioconductor-bounces at ... > [bioconductor-bounces at ...] de la part de Pages, Herve [hpages at ...] > Date d'envoi : 18 mars 2011 01:57 > ? : ligang > Cc : bioconductor at ... > Objet : Re: [BioC] Question about translate funciton in Biostrings package > > Hi LiGang, > > It's not clear to me what translate() should do when the input > contains ambiguity letters. I can see that for some ambiguities > in the input, the output won't be affected. Like in your first > example, replacing M by either A or C produces the same ouput: > > > translate(DNAString("AACTGTCGACCC")) > 4-letter "AAString" instance > seq: NCRP > > translate(DNAString("AACTGTCGCCCC")) > 4-letter "AAString" instance > seq: NCRP > > So yes I could add support for this. > > Otherwise, in general, what to do? Should the output contain letters > representing ambiguous amino acids? The problem is that last time I > checked I was not able to find "official" ambiguity codes for amino > acids that would represent all possible ambiguities in the protein > sequence resulting from all possible ambiguities in the DNA sequence. > > Can you please clarify what your question is? > > Thanks, > H. > > ----- Original Message ----- > From: "ligang" <luzifer.li at="" ...=""> > To: bioconductor at ... > Sent: Thursday, March 17, 2011 10:23:15 PM > Subject: [BioC] Question about translate funciton in Biostrings package > > Dear list, > > I'm using "tanslate" function in "Biostrings" package to translate DNA sequence > in proteins. > > It did well when the base letter is "A/G/C/T" > > But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M", > "tanslate" function did not work, for example: > > translate(DNAString("AACTGTCGMCCC")) > #Error in translate(DNAStringSet(x)) : not a base at pos 9 > > translate(DNAString("AACTGNTCG")) > #Error in translate(DNAStringSet(x)) : not a base at pos 6 > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=Chinese_People's Republic of China.936 > LC_CTYPE=Chinese_People's > Republic of China.936 LC_MONETARY=Chinese_People's Republic of China.936 > [4] LC_NUMERIC=C LC_TIME=Chinese_People's > Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.18.2 IRanges_1.8.9 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 tools_2.12.1 > > --- > LiGang > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > For some tools such as the translate tool at 'http://expasy.org/tools/dna.html', for DNAString "TTN", expasy tool return "X". and my question is >translate(DNAString("TTN")) could it return "X"? in Biostrings package, "X" is an accptable letter of AAString, for example: AAString("XXXARN") of course, It would be better if the 'translate' function can be more flexible, for example translate(DNAString("TCN")) ##because "TCA","TCG","TCC","TCT"all translate to 'Ser'?could above command return "S"? translate(DNAString("TTY")) ###because both "TCC" and "TCT" translate to 'Phe', could the above command return "F"? --- LiGang

Organism Biostrings Organism Biostrings • 1.7k views

ADD COMMENT • link 14.9 years ago li lilingdu ▴ 450

Login before adding your answer.