Question: difference results by needle program and pairwiseAlignment
0
gravatar for wang peter
6.5 years ago by
wang peter2.0k
wang peter2.0k wrote:
i used EMBOSS package needle program got the score 6 pairwiseAlignment(seq1,seq2, type = "global", substitutionMatrix = "BLOSUM62", gapOpening = -10, gapExtension = -0.5) both use the same parameter # Matrix: EBLOSUM62 # Gap_penalty: -10.0 # Extend_penalty: -0.5 sequences are >AAG19119.1 mgitkqvsirsdnfipghrvfderpHILHvcdikvnpednmwefeyqkekdeissrhtkdvylymeldek qhskltsffl eekprdssHSFHvdisgdiiqqvtkrpgagtsdkeknkytpspirfealsdrvvietfagdnpiipyyev thhdtpqdrh nvsriesfiqsmqdggstpgleplptelvselqsaewgeevrHHLHegdecyrnsllhpalssyihaiew alisflkeke dvdiiqqekngdlyyfasgnsilgevqdtgelsqksisrikslnraerrwmghhksgeatkeeldgmrar ltqileelfg tdsark >AAG19157.1 msvaddstddHGHHlpavedwpkgfgeaswwpfitaigaagfyigaalyvlgrgesalvgpmvgpgvfia stfaflagly gwvyhafvaaywsndgggstalrwgmigflgseiatfgagfayyffiragtqwstaasaipdgflgslvv vntailvvss ftlHFAHvalrkgnrsrflallvstlvlgvvfiggqvyeyyefiahegftlsgglfesaffgltglHGLH vtmgavligi imvrglrgqysadrhtsvstvsmywhfvdivwiflvvvlyagsva sionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5 [4] Rsamtools_1.8.6 lattice_0.20-6 Biostrings_2.24.1 [7] GenomicRanges_1.8.13 IRanges_1.14.4 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biobase_2.16.0 bitops_1.0-5 grid_2.15.1 hwriter_1.3 [5] stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0 thank you very much -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
• 698 views
ADD COMMENTlink modified 6.5 years ago by Hervé Pagès ♦♦ 14k • written 6.5 years ago by wang peter2.0k
Answer: difference results by needle program and pairwiseAlignment
0
gravatar for Hervé Pagès
6.5 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:
Hi, On 12/06/2012 01:15 PM, Wang Peter wrote: > i used EMBOSS package needle program got the score 6 > > pairwiseAlignment(seq1,seq2, type = "global", substitutionMatrix = > "BLOSUM62", gapOpening = -10, gapExtension = -0.5) > > both use the same parameter > > # Matrix: EBLOSUM62 > # Gap_penalty: -10.0 > # Extend_penalty: -0.5 2 things: 1. pairwiseAlignment() interprets gapOpening/gapExtension not really the standard way (which is kind of unfortunate). See ?writePairwiseAlignments for how to translate between gapOpening/gapExtension and Gap_penalty/Extend_penalty. So to effectively achieve EMBOSS Gap_penalty and Extend_penalty of -10 and -0.5, respectively, you would need to set 'gapOpening' and 'gapExtension' to 9.5 and 0.5, respectively. 2. However, another kind of strange feature of pairwiseAlignment() (that I was not aware of until now), is that it will only accept non-positive values for gapOpening and gapExtension. More precisely, you can pass positive values, but, in that case, the opposite values will effectively (and silently) be used and stored in the returned object: pa <- pairwiseAlignment(seq1, seq2, type="global", substitutionMatrix="BLOSUM62", gapOpening=9.5, gapExtension=0.5) > pa at gapOpening [1] -9.5 > pa at gapExtension [1] -0.5 So there is no way you can effectively set the 'gapOpening' and 'gapExtension' args to achieve EMBOSS Gap_penalty and Extend_penalty to -10 and -0.5. BUT THE MOST IMPORTANT THING IS THIS: Gap_penalty and Extend_penalty are really aimed to be positive! By setting them to negative values, you encourage gaps i.e. the more gaps the better the final score will be. That means you can align pretty much anything with anything. With your settings, even sequences that have almost nothing in common (like your 2 sequences) get a positive score! I guess this is why the original author of pairwiseAlignment() didn't want to allow positive gapOpening and/or gapExtension values. Maybe we should allow them though, just so that people can experiment and compare results with other tools. But if not, pairwiseAlignment() should raise an error or at least issue a warning that the opposite values are effectively used. What do people think? Thanks, H. > > sequences are >> AAG19119.1 > mgitkqvsirsdnfipghrvfderpHILHvcdikvnpednmwefeyqkekdeissrhtkdvylymeld ekqhskltsffl > eekprdssHSFHvdisgdiiqqvtkrpgagtsdkeknkytpspirfealsdrvvietfagdnpiipyy evthhdtpqdrh > nvsriesfiqsmqdggstpgleplptelvselqsaewgeevrHHLHegdecyrnsllhpalssyihai ewalisflkeke > dvdiiqqekngdlyyfasgnsilgevqdtgelsqksisrikslnraerrwmghhksgeatkeeldgmr arltqileelfg > tdsark >> AAG19157.1 > msvaddstddHGHHlpavedwpkgfgeaswwpfitaigaagfyigaalyvlgrgesalvgpmvgpgvf iastfaflagly > gwvyhafvaaywsndgggstalrwgmigflgseiatfgagfayyffiragtqwstaasaipdgflgsl vvvntailvvss > ftlHFAHvalrkgnrsrflallvstlvlgvvfiggqvyeyyefiahegftlsgglfesaffgltglHG LHvtmgavligi > imvrglrgqysadrhtsvstvsmywhfvdivwiflvvvlyagsva > > sionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 > [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 > [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5 > [4] Rsamtools_1.8.6 lattice_0.20-6 Biostrings_2.24.1 > [7] GenomicRanges_1.8.13 IRanges_1.14.4 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.16.0 bitops_1.0-5 grid_2.15.1 hwriter_1.3 > [5] stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0 > > thank you very much > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENTlink written 6.5 years ago by Hervé Pagès ♦♦ 14k
dear Herve? Thank you for your help my needle setting is default >> # Matrix: EBLOSUM62 >> # Gap_penalty: 10.0 >> # Extend_penalty: 0.5 > and the score is 6 how can i set the pairwiseAlignment Gap_penalty: ? Extend_penalty: ?
ADD REPLYlink written 6.5 years ago by wang peter2.0k
On 12/06/2012 02:15 PM, Wang Peter wrote: > dear Herve? > Thank you for your help > > my needle setting is default >>> # Matrix: EBLOSUM62 >>> # Gap_penalty: 10.0 >>> # Extend_penalty: 0.5 >> > and the score is 6 > > how can i set the pairwiseAlignment > Gap_penalty: ? > Extend_penalty: ? > The answer is in my previous email. We know you are fast but managing to read and understand my previous email + the man page for writePairwiseAlignments (as suggested) + do a little bit of your own experimentation with different settings (type="global", type="global-local", etc...) in order to understand how they affect the output will probably take you more than 9 minutes. Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 6.5 years ago by Hervé Pagès ♦♦ 14k
thank you very much it should be scorem[i,j]=pairwiseAlignment(sequences[[i]], sequences[[j]], type = "overlap", substitutionMatrix = "BLOSUM62", gapOpening = -9.5, gapExtension = -0.5,scoreOnly=T) i think "overlap" solves (Needleman-Wunsch) global alignment,
ADD REPLYlink written 6.5 years ago by wang peter2.0k
Answer: difference results by needle program and pairwiseAlignment
0
gravatar for David Lapointe
6.5 years ago by
David Lapointe170 wrote:
I am not sure what your issue is. These sequences do not align well at all anyway. Global alignment is probably not a good way to compare them. There are large gaps at each end and 72% of the alignment ( according to needle) are gaps. Does pairwiseAlignment penalize end gaps? What score did pairwiseAlignment report? David -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Wang Peter Sent: Thursday, December 06, 2012 4:15 PM To: bioconductor at r-project.org Subject: [BioC] difference results by needle program and pairwiseAlignment i used EMBOSS package needle program got the score 6 pairwiseAlignment(seq1,seq2, type = "global", substitutionMatrix = "BLOSUM62", gapOpening = -10, gapExtension = -0.5) both use the same parameter # Matrix: EBLOSUM62 # Gap_penalty: -10.0 # Extend_penalty: -0.5 sequences are >AAG19119.1 mgitkqvsirsdnfipghrvfderpHILHvcdikvnpednmwefeyqkekdeissrhtkdvylymeldek qhskltsffl eekprdssHSFHvdisgdiiqqvtkrpgagtsdkeknkytpspirfealsdrvvietfagdnpiipyyev thhdtpqdrh nvsriesfiqsmqdggstpgleplptelvselqsaewgeevrHHLHegdecyrnsllhpalssyihaiew alisflkeke dvdiiqqekngdlyyfasgnsilgevqdtgelsqksisrikslnraerrwmghhksgeatkeeldgmrar ltqileelfg tdsark >AAG19157.1 msvaddstddHGHHlpavedwpkgfgeaswwpfitaigaagfyigaalyvlgrgesalvgpmvgpgvfia stfaflagly gwvyhafvaaywsndgggstalrwgmigflgseiatfgagfayyffiragtqwstaasaipdgflgslvv vntailvvss ftlHFAHvalrkgnrsrflallvstlvlgvvfiggqvyeyyefiahegftlsgglfesaffgltglHGLH vtmgavligi imvrglrgqysadrhtsvstvsmywhfvdivwiflvvvlyagsva sionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5 [4] Rsamtools_1.8.6 lattice_0.20-6 Biostrings_2.24.1 [7] GenomicRanges_1.8.13 IRanges_1.14.4 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biobase_2.16.0 bitops_1.0-5 grid_2.15.1 hwriter_1.3 [5] stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0 thank you very much -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.5 years ago by David Lapointe170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 267 users visited in the last hour