Biostrings - pairwiseAlignment trimmed pattern returned
1
0
Entering edit mode
Erik Wright ▴ 210
@erik-wright-4003
Last seen 10.4 years ago
Hello, I have been using the Biostrings function "pairwiseAlignment" lately to perform sequence alignments. I have discovered that with difficult alignments it sometimes returns incomplete patterns. For example: pairwiseAlignment("ACTGACTGACTGACTG","AAGAAGAGTTATGGGAGTAACTGACC") Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] ACT-----------GACTGACTGACT subject: [1] AAGAAGAGTTATGGGAGTAACTGACC score: -77.67886 As you can see, the last character of the pattern ("G") has been removed. If I set type="global-local" this does not happen, but it does in this instance: pairwiseAlignment("ACTGACTGACTGACTG","CTGAGAGGGTGATCGGCCACATTGGG",type ="global-local") Global-Local PairwiseAlignedFixedSubject (1 of 1) pattern: [2] CTGACTGACTGACTG subject: [1] CTGAGAGGGTGATCG score: -31.55991 In this case the first character of the pattern ("A") was removed in the alignment. Please help me to understand why this is happening, and if there might be a solution. Thanks in advance, Erik sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RSQLite_0.10.0 DBI_0.2-5 Biostrings_2.22.0 IRanges_1.12.2 loaded via a namespace (and not attached): [1] tools_2.14.0
Alignment Biostrings Alignment Biostrings • 1.3k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States
Hi Erik, This behavior can be explained by altering the gapOpening and gapExtension arguments. You may want to look up "affine gap penalty" and look at paper referenced on the ?pairwiseAlignment man page. By default gapOpening=-10 and gapExtension=-2. Using the defaults we match your example below, pairwiseAlignment("ACTGACTGACTGACTG", "AAGAAGAGTTATGGGAGTAACTGACC", gapOpening=-10, gapExtension=-2) If we don't penalize the presence or extension of gaps we get an alignment that includes enough gaps to match the sequences. It is not recommended to not penalize gaps, this is for the sake of example. pairwiseAlignment("ACTGACTGACTGACTG", "AAGAAGAGTTATGGGAGTAACTGACC", gapOpening=0, gapExtension=0) The same can be done with your global-local example to get an idea of how these parameters affect the outcome. Additionally, global-local tries to find an alignment that matches the start and the end of one or the other sequence. This is useful for the case where one sequence is downstream of the other and they partially overlap. The global algorithm tries to align every residue in each sequence. You'll want to choose the algorithm that is most appropriate for your data then modify the arguments if necessary. Valerie On 02/15/2012 02:25 PM, Erik Wright wrote: > Hello, > > I have been using the Biostrings function "pairwiseAlignment" lately to perform sequence alignments. I have discovered that with difficult alignments it sometimes returns incomplete patterns. For example: > > pairwiseAlignment("ACTGACTGACTGACTG","AAGAAGAGTTATGGGAGTAACTGACC") > Global PairwiseAlignedFixedSubject (1 of 1) > pattern: [1] ACT-----------GACTGACTGACT > subject: [1] AAGAAGAGTTATGGGAGTAACTGACC > score: -77.67886 > > As you can see, the last character of the pattern ("G") has been removed. If I set type="global-local" this does not happen, but it does in this instance: > > pairwiseAlignment("ACTGACTGACTGACTG","CTGAGAGGGTGATCGGCCACATTGGG",type ="global-local") > Global-Local PairwiseAlignedFixedSubject (1 of 1) > pattern: [2] CTGACTGACTGACTG > subject: [1] CTGAGAGGGTGATCG > score: -31.55991 > > In this case the first character of the pattern ("A") was removed in the alignment. > > Please help me to understand why this is happening, and if there might be a solution. > > Thanks in advance, > Erik > > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] RSQLite_0.10.0 DBI_0.2-5 Biostrings_2.22.0 IRanges_1.12.2 > > loaded via a namespace (and not attached): > [1] tools_2.14.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Thanks Valerie, Your explanation was much better than the one I found in the ?pairwiseAlignment man page. It states: "global" = align whole strings with end gap penalties ... "global- local" = align whole strings in pattern with consecutive subsequence of subject Obviously the "whole strings" were not aligned in my previous email's examples. A quick workaround for this issue that does not require changing the default gap penalties is to pad the subject sequences with multiple gaps on both ends. pairwiseAlignment("ACTGACTGACTGACTG","-- CTGAGAGGGTGATCGGCCACATTGGG--",type="global-local") Global-Local PairwiseAlignedFixedSubject (1 of 1) pattern: [1] ACTGACTGACTGACTG subject: [2] -CTGAGAGGGTGATCG score: -21.21098 Thanks again, Erik On Feb 20, 2012, at 11:53 PM, Valerie Obenchain wrote: > Hi Erik, > > This behavior can be explained by altering the gapOpening and > gapExtension arguments. You may want to look up "affine gap penalty" and > look at paper referenced on the ?pairwiseAlignment man page. > > By default gapOpening=-10 and gapExtension=-2. Using the defaults we > match your example below, > > pairwiseAlignment("ACTGACTGACTGACTG", "AAGAAGAGTTATGGGAGTAACTGACC", > gapOpening=-10, gapExtension=-2) > > If we don't penalize the presence or extension of gaps we get an > alignment that includes enough gaps to match the sequences. It is not > recommended to not penalize gaps, this is for the sake of example. > > pairwiseAlignment("ACTGACTGACTGACTG", "AAGAAGAGTTATGGGAGTAACTGACC", > gapOpening=0, gapExtension=0) > > > The same can be done with your global-local example to get an idea of > how these parameters affect the outcome. Additionally, global-local > tries to find an alignment that matches the start and the end of one or > the other sequence. This is useful for the case where one sequence is > downstream of the other and they partially overlap. The global algorithm > tries to align every residue in each sequence. You'll want to choose the > algorithm that is most appropriate for your data then modify the > arguments if necessary. > > > Valerie > > > > On 02/15/2012 02:25 PM, Erik Wright wrote: >> Hello, >> >> I have been using the Biostrings function "pairwiseAlignment" lately to perform sequence alignments. I have discovered that with difficult alignments it sometimes returns incomplete patterns. For example: >> >> pairwiseAlignment("ACTGACTGACTGACTG","AAGAAGAGTTATGGGAGTAACTGACC") >> Global PairwiseAlignedFixedSubject (1 of 1) >> pattern: [1] ACT-----------GACTGACTGACT >> subject: [1] AAGAAGAGTTATGGGAGTAACTGACC >> score: -77.67886 >> >> As you can see, the last character of the pattern ("G") has been removed. If I set type="global-local" this does not happen, but it does in this instance: >> >> pairwiseAlignment("ACTGACTGACTGACTG","CTGAGAGGGTGATCGGCCACATTGGG",type ="global-local") >> Global-Local PairwiseAlignedFixedSubject (1 of 1) >> pattern: [2] CTGACTGACTGACTG >> subject: [1] CTGAGAGGGTGATCG >> score: -31.55991 >> >> In this case the first character of the pattern ("A") was removed in the alignment. >> >> Please help me to understand why this is happening, and if there might be a solution. >> >> Thanks in advance, >> Erik >> >> >> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] RSQLite_0.10.0 DBI_0.2-5 Biostrings_2.22.0 IRanges_1.12.2 >> >> loaded via a namespace (and not attached): >> [1] tools_2.14.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6