Question: obtaining a complete global alignment via pairwiseAlignment
2
gravatar for Guest User
6.5 years ago by
Guest User12k
Guest User12k wrote:
Hello, I'm interested in using the pairwiseAlignment function in Biostrings to perform simple alignments of short sequences. I noticed that even the global alignment mode returns an alignment with gapped ends trimmed off. For example: > s1 = DNAString ("AAGGAA") > s2 = DNAString ("GG") > pairwiseAlignment (s1, s2, type = "global") Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [3] GG subject: [1] GG score: -32.03649 Is it possible to obtain the complete alignment including the (possibly) gapped ends? In this case, that would be: --GG-- AAGGAA For my purposes, having the complete gapped alignment is important. My question seems related to a previous post: https://stat.ethz.ch/pipermail/bioconductor/2012-February/043577.html Thank you in advance. Best wishes, Rob -- Robert K. Bradley, Assistant Member Fred Hutchinson Cancer Research Center http://labs.fhcrc.org/bradleyr/ -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] parallel grid grDevices utils datasets stats graphics methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.28.0 org.Hs.eg.db_2.9.0 beanplot_1.1 [5] IlluminaHumanMethylation450kmanifest_0.4.0 goseq_1.12.0 geneLenDataBase_0.99.11 BiasedUrn_1.05 [9] GO.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 fastcluster_1.1.11 [13] minfi_1.6.0 reshape_0.8.4 plyr_1.8 lattice_0.20-15 [17] FDb.InfiniumMethylation.hg19_1.0.1 rtracklayer_1.20.2 Biostrings_2.28.0 GenomicFeatures_1.12.2 [21] AnnotationDbi_1.22.6 Biobase_2.20.0 GenomicRanges_1.12.4 IRanges_1.18.1 [25] VennDiagram_1.6.0 RColorBrewer_1.0-5 BiocGenerics_0.6.0 gplots_2.11.0.1 [29] MASS_7.3-26 KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 [33] gtools_2.7.1 loaded via a namespace (and not attached): [1] Matrix_1.0-12 R.methodsS3_1.4.2 RCurl_1.95-4.1 Rsamtools_1.12.3 XML_3.95-0.2 biomaRt_2.16.0 bitops_1.0-5 illuminaio_0.2.0 [9] limma_3.16.5 matrixStats_0.8.1 mclust_4.1 mgcv_1.7-24 multtest_2.16.0 nlme_3.1-109 nor1mix_1.1-4 preprocessCore_1.22.0 [17] siggenes_1.34.0 splines_3.0.1 stats4_3.0.1 survival_2.37-4 tools_3.0.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
alignment go cancer bsgenome • 615 views
ADD COMMENTlink modified 6.5 years ago by Martin Morgan ♦♦ 24k • written 6.5 years ago by Guest User12k
Answer: obtaining a complete global alignment via pairwiseAlignment
0
gravatar for Martin Morgan
6.5 years ago by
Martin Morgan ♦♦ 24k
United States
Martin Morgan ♦♦ 24k wrote:
On 06/14/2013 07:57 PM, Robert K.Bradley [guest] wrote: > > Hello, > > I'm interested in using the pairwiseAlignment function in Biostrings to perform simple alignments of short sequences. I noticed that even the global alignment mode returns an alignment with gapped ends trimmed off. For example: > >> s1 = DNAString ("AAGGAA") >> s2 = DNAString ("GG") >> pairwiseAlignment (s1, s2, type = "global") > Global PairwiseAlignmentsSingleSubject (1 of 1) > pattern: [3] GG > subject: [1] GG > score: -32.03649 > > Is it possible to obtain the complete alignment including the (possibly) gapped ends? In this case, that would be: > --GG-- > AAGGAA > For my purposes, having the complete gapped alignment is important. Hi Robert -- Not an expert on pairwiseAlignment, but I think the return value is a class that contains more information than it is showing > xx = pairwiseAlignment(s2, s1, type="global") > class(xx) [1] "PairwiseAlignmentsSingleSubject" attr(,"package") [1] "Biostrings" > class?PairwiseAlignmentsSingleSubject > aligned(xx) A DNAStringSet instance of length 1 width seq [1] 6 --GG-- Martin > > My question seems related to a previous post: > https://stat.ethz.ch/pipermail/bioconductor/2012-February/043577.html > > Thank you in advance. > > Best wishes, > Rob > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENTlink written 6.5 years ago by Martin Morgan ♦♦ 24k
Thanks Martin. as.character (aligned (...)) does return a gapped string for one of the two aligned sequences, but even that command trims off unaligned characters. For example: For example: > s1 = DNAString ("AGGGG") > s2 = DNAString ("GGGGA") > xx = pairwiseAlignment (s1, s2, gapOpening = -1) > as.character (xx) [1] "GGGG-" while the whole gapped string is "AGGGG-". I apologize in advance if I'm missing something obvious here. Best, Rob On 6/15/13 6:45 AM, Martin Morgan wrote: > On 06/14/2013 07:57 PM, Robert K.Bradley [guest] wrote: >> >> Hello, >> >> I'm interested in using the pairwiseAlignment function in Biostrings >> to perform simple alignments of short sequences. I noticed that even >> the global alignment mode returns an alignment with gapped ends >> trimmed off. For example: >> >>> s1 = DNAString ("AAGGAA") >>> s2 = DNAString ("GG") >>> pairwiseAlignment (s1, s2, type = "global") >> Global PairwiseAlignmentsSingleSubject (1 of 1) >> pattern: [3] GG >> subject: [1] GG >> score: -32.03649 >> >> Is it possible to obtain the complete alignment including the >> (possibly) gapped ends? In this case, that would be: >> --GG-- >> AAGGAA >> For my purposes, having the complete gapped alignment is important. > > Hi Robert -- Not an expert on pairwiseAlignment, but I think the return > value is a class that contains more information than it is showing > > > xx = pairwiseAlignment(s2, s1, type="global") > > class(xx) > [1] "PairwiseAlignmentsSingleSubject" > attr(,"package") > [1] "Biostrings" > > class?PairwiseAlignmentsSingleSubject > > aligned(xx) > A DNAStringSet instance of length 1 > width seq > [1] 6 --GG-- > > Martin > >> >> My question seems related to a previous post: >> https://stat.ethz.ch/pipermail/bioconductor/2012-February/043577.html >> >> Thank you in advance. >> >> Best wishes, >> Rob >> > >
ADD REPLYlink written 6.5 years ago by Robert K. Bradley40
On 06/15/2013 09:18 AM, Robert K. Bradley wrote: > Thanks Martin. as.character (aligned (...)) does return a gapped string for one > of the two aligned sequences, but even that command trims off unaligned > characters. For example: > For example: > > s1 = DNAString ("AGGGG") > > s2 = DNAString ("GGGGA") > > xx = pairwiseAlignment (s1, s2, gapOpening = -1) > > as.character (xx) > [1] "GGGG-" > while the whole gapped string is "AGGGG-". > > I apologize in advance if I'm missing something obvious here. Not really doing much better on my end; I'll offer this up then step aside until someone with more expertise chimes in. Maybe this is closer to what you're after, padding aligned(xx) with and left or right overhang of pattern lraligned <- function(pattern, subject, ...) { xx <- pairwiseAlignment (pattern, subject, gapOpening = -1) aln <- aligned(xx) lpad <- substr(pattern, 1L, start(pattern(xx)) - 1L) rpad <- substr(pattern, end(pattern(xx)) + 1L, nchar(pattern)) paste0(lpad, aln, rpad) } > lraligned(DNAString ("AGGGG"), DNAString ("GGGGA"), type="global") [1] "AGGGG-" > lraligned(DNAString ("GGGGA"), DNAString ("AGGGG"), type="global") [1] "-GGGGA" > lraligned(DNAString("AAGGAA"), DNAString("GG")) [1] "AAGGAA" > lraligned(DNAString("GG"), DNAString("AAGGAA")) [1] "--GG--" Martin > > Best, > Rob > > On 6/15/13 6:45 AM, Martin Morgan wrote: >> On 06/14/2013 07:57 PM, Robert K.Bradley [guest] wrote: >>> >>> Hello, >>> >>> I'm interested in using the pairwiseAlignment function in Biostrings >>> to perform simple alignments of short sequences. I noticed that even >>> the global alignment mode returns an alignment with gapped ends >>> trimmed off. For example: >>> >>>> s1 = DNAString ("AAGGAA") >>>> s2 = DNAString ("GG") >>>> pairwiseAlignment (s1, s2, type = "global") >>> Global PairwiseAlignmentsSingleSubject (1 of 1) >>> pattern: [3] GG >>> subject: [1] GG >>> score: -32.03649 >>> >>> Is it possible to obtain the complete alignment including the >>> (possibly) gapped ends? In this case, that would be: >>> --GG-- >>> AAGGAA >>> For my purposes, having the complete gapped alignment is important. >> >> Hi Robert -- Not an expert on pairwiseAlignment, but I think the return >> value is a class that contains more information than it is showing >> >> > xx = pairwiseAlignment(s2, s1, type="global") >> > class(xx) >> [1] "PairwiseAlignmentsSingleSubject" >> attr(,"package") >> [1] "Biostrings" >> > class?PairwiseAlignmentsSingleSubject >> > aligned(xx) >> A DNAStringSet instance of length 1 >> width seq >> [1] 6 --GG-- >> >> Martin >> >>> >>> My question seems related to a previous post: >>> https://stat.ethz.ch/pipermail/bioconductor/2012-February/043577.html >>> >>> Thank you in advance. >>> >>> Best wishes, >>> Rob >>> >> >> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLYlink written 6.5 years ago by Martin Morgan ♦♦ 24k
Hi guys, It was an early decision of the author of pairwiseAlignment() to not display the indels that occur at the ends of the aligned sequences. The consequence of this is that the alignment "looks" like it's a local alignment even though it's global. Using writePairwiseAlignments() to display the alignments disambiguates this. With your 1st example: s1 <- DNAString("AAGGAA") s2 <- DNAString("GG") xx <- pairwiseAlignment(s1, s2, type="global") Then: > xx Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [3] GG subject: [1] GG score: -32.03649 > writePairwiseAlignments(xx) ######################################## # Program: Biostrings (version 2.29.7), a Bioconductor package # Rundate: Wed Jun 19 11:15:27 2013 ######################################## #======================================= # # Aligned_sequences: 2 # 1: P1 # 2: S1 # Matrix: NA # Gap_penalty: 14.0 # Extend_penalty: 4.0 # # Length: 6 # Identity: 2/6 (33.3%) # Similarity: NA/6 (NA%) # Gaps: 4/6 (66.7%) # Score: -32.03649 # # #======================================= P1 1 AAGGAA 6 || S1 1 --GG-- 2 #--------------------------------------- #--------------------------------------- With your 2nd example: s1 <- DNAString("AGGGG") s2 <- DNAString("GGGGA") xx <- pairwiseAlignment(s1, s2, gapOpening=-1) Then: > xx Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [2] GGGG subject: [1] GGGG score: -2.072976 > writePairwiseAlignments(xx) ######################################## # Program: Biostrings (version 2.29.7), a Bioconductor package # Rundate: Wed Jun 19 11:19:40 2013 ######################################## #======================================= # # Aligned_sequences: 2 # 1: P1 # 2: S1 # Matrix: NA # Gap_penalty: 5.0 # Extend_penalty: 4.0 # # Length: 6 # Identity: 4/6 (66.7%) # Similarity: NA/6 (NA%) # Gaps: 2/6 (33.3%) # Score: -2.072976 # # #======================================= P1 1 AGGGG- 5 |||| S1 1 -GGGGA 5 #--------------------------------------- #--------------------------------------- It's sad though that 'aligned(pattern(xx))' and 'aligned(subject(xx))' also trim those indels that are at the ends because they are really part of the aligned sequences. However this behavior has been around for so long now that I'm not sure we want to touch it. What we could to is add an extra argument to the aligned() getter, say 'full.alignment', with default to FALSE, to let the user extract the whole thing. Would that be useful? Cheers, H. On 06/15/2013 10:19 AM, Martin Morgan wrote: > On 06/15/2013 09:18 AM, Robert K. Bradley wrote: >> Thanks Martin. as.character (aligned (...)) does return a gapped >> string for one >> of the two aligned sequences, but even that command trims off unaligned >> characters. For example: >> For example: >> > s1 = DNAString ("AGGGG") >> > s2 = DNAString ("GGGGA") >> > xx = pairwiseAlignment (s1, s2, gapOpening = -1) >> > as.character (xx) >> [1] "GGGG-" >> while the whole gapped string is "AGGGG-". >> >> I apologize in advance if I'm missing something obvious here. > > Not really doing much better on my end; I'll offer this up then step > aside until someone with more expertise chimes in. Maybe this is closer > to what you're after, padding aligned(xx) with and left or right > overhang of pattern > > lraligned <- function(pattern, subject, ...) { > xx <- pairwiseAlignment (pattern, subject, gapOpening = -1) > aln <- aligned(xx) > lpad <- substr(pattern, 1L, start(pattern(xx)) - 1L) > rpad <- substr(pattern, end(pattern(xx)) + 1L, nchar(pattern)) > paste0(lpad, aln, rpad) > } > > > lraligned(DNAString ("AGGGG"), DNAString ("GGGGA"), type="global") > [1] "AGGGG-" > > lraligned(DNAString ("GGGGA"), DNAString ("AGGGG"), type="global") > [1] "-GGGGA" > > lraligned(DNAString("AAGGAA"), DNAString("GG")) > [1] "AAGGAA" > > lraligned(DNAString("GG"), DNAString("AAGGAA")) > [1] "--GG--" > > Martin > >> >> Best, >> Rob >> >> On 6/15/13 6:45 AM, Martin Morgan wrote: >>> On 06/14/2013 07:57 PM, Robert K.Bradley [guest] wrote: >>>> >>>> Hello, >>>> >>>> I'm interested in using the pairwiseAlignment function in Biostrings >>>> to perform simple alignments of short sequences. I noticed that even >>>> the global alignment mode returns an alignment with gapped ends >>>> trimmed off. For example: >>>> >>>>> s1 = DNAString ("AAGGAA") >>>>> s2 = DNAString ("GG") >>>>> pairwiseAlignment (s1, s2, type = "global") >>>> Global PairwiseAlignmentsSingleSubject (1 of 1) >>>> pattern: [3] GG >>>> subject: [1] GG >>>> score: -32.03649 >>>> >>>> Is it possible to obtain the complete alignment including the >>>> (possibly) gapped ends? In this case, that would be: >>>> --GG-- >>>> AAGGAA >>>> For my purposes, having the complete gapped alignment is important. >>> >>> Hi Robert -- Not an expert on pairwiseAlignment, but I think the return >>> value is a class that contains more information than it is showing >>> >>> > xx = pairwiseAlignment(s2, s1, type="global") >>> > class(xx) >>> [1] "PairwiseAlignmentsSingleSubject" >>> attr(,"package") >>> [1] "Biostrings" >>> > class?PairwiseAlignmentsSingleSubject >>> > aligned(xx) >>> A DNAStringSet instance of length 1 >>> width seq >>> [1] 6 --GG-- >>> >>> Martin >>> >>>> >>>> My question seems related to a previous post: >>>> https://stat.ethz.ch/pipermail/bioconductor/2012-February/043577.html >>>> >>>> Thank you in advance. >>>> >>>> Best wishes, >>>> Rob >>>> >>> >>> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 6.5 years ago by Hervé Pagès ♦♦ 14k
Thanks Herv?. Adding such an option 'full.alignment' would be extremely useful to me, since it would let me get access to the complete global alignment with minimal string parsing. On 6/19/13 11:33 AM, Hervé Pagès wrote: > Hi guys, > > It was an early decision of the author of pairwiseAlignment() to not > display the indels that occur at the ends of the aligned sequences. > The consequence of this is that the alignment "looks" like it's a > local alignment even though it's global. > > Using writePairwiseAlignments() to display the alignments disambiguates > this. > > With your 1st example: > > s1 <- DNAString("AAGGAA") > s2 <- DNAString("GG") > xx <- pairwiseAlignment(s1, s2, type="global") > > Then: > > > xx > Global PairwiseAlignmentsSingleSubject (1 of 1) > pattern: [3] GG > subject: [1] GG > score: -32.03649 > > > writePairwiseAlignments(xx) > ######################################## > # Program: Biostrings (version 2.29.7), a Bioconductor package > # Rundate: Wed Jun 19 11:15:27 2013 > ######################################## > #======================================= > # > # Aligned_sequences: 2 > # 1: P1 > # 2: S1 > # Matrix: NA > # Gap_penalty: 14.0 > # Extend_penalty: 4.0 > # > # Length: 6 > # Identity: 2/6 (33.3%) > # Similarity: NA/6 (NA%) > # Gaps: 4/6 (66.7%) > # Score: -32.03649 > # > # > #======================================= > > P1 1 AAGGAA 6 > || > S1 1 --GG-- 2 > > > #--------------------------------------- > #--------------------------------------- > > With your 2nd example: > > s1 <- DNAString("AGGGG") > s2 <- DNAString("GGGGA") > xx <- pairwiseAlignment(s1, s2, gapOpening=-1) > > Then: > > > xx > Global PairwiseAlignmentsSingleSubject (1 of 1) > pattern: [2] GGGG > subject: [1] GGGG > score: -2.072976 > > > writePairwiseAlignments(xx) > ######################################## > # Program: Biostrings (version 2.29.7), a Bioconductor package > # Rundate: Wed Jun 19 11:19:40 2013 > ######################################## > #======================================= > # > # Aligned_sequences: 2 > # 1: P1 > # 2: S1 > # Matrix: NA > # Gap_penalty: 5.0 > # Extend_penalty: 4.0 > # > # Length: 6 > # Identity: 4/6 (66.7%) > # Similarity: NA/6 (NA%) > # Gaps: 2/6 (33.3%) > # Score: -2.072976 > # > # > #======================================= > > P1 1 AGGGG- 5 > |||| > S1 1 -GGGGA 5 > > > #--------------------------------------- > #--------------------------------------- > > It's sad though that 'aligned(pattern(xx))' and 'aligned(subject(xx))' > also trim those indels that are at the ends because they are really > part of the aligned sequences. However this behavior has been around for > so long now that I'm not sure we want to touch it. What we could to > is add an extra argument to the aligned() getter, say 'full.alignment', > with default to FALSE, to let the user extract the whole thing. > > Would that be useful? > > Cheers, > H. > > > On 06/15/2013 10:19 AM, Martin Morgan wrote: >> On 06/15/2013 09:18 AM, Robert K. Bradley wrote: >>> Thanks Martin. as.character (aligned (...)) does return a gapped >>> string for one >>> of the two aligned sequences, but even that command trims off unaligned >>> characters. For example: >>> For example: >>> > s1 = DNAString ("AGGGG") >>> > s2 = DNAString ("GGGGA") >>> > xx = pairwiseAlignment (s1, s2, gapOpening = -1) >>> > as.character (xx) >>> [1] "GGGG-" >>> while the whole gapped string is "AGGGG-". >>> >>> I apologize in advance if I'm missing something obvious here. >> >> Not really doing much better on my end; I'll offer this up then step >> aside until someone with more expertise chimes in. Maybe this is closer >> to what you're after, padding aligned(xx) with and left or right >> overhang of pattern >> >> lraligned <- function(pattern, subject, ...) { >> xx <- pairwiseAlignment (pattern, subject, gapOpening = -1) >> aln <- aligned(xx) >> lpad <- substr(pattern, 1L, start(pattern(xx)) - 1L) >> rpad <- substr(pattern, end(pattern(xx)) + 1L, nchar(pattern)) >> paste0(lpad, aln, rpad) >> } >> >> > lraligned(DNAString ("AGGGG"), DNAString ("GGGGA"), type="global") >> [1] "AGGGG-" >> > lraligned(DNAString ("GGGGA"), DNAString ("AGGGG"), type="global") >> [1] "-GGGGA" >> > lraligned(DNAString("AAGGAA"), DNAString("GG")) >> [1] "AAGGAA" >> > lraligned(DNAString("GG"), DNAString("AAGGAA")) >> [1] "--GG--" >> >> Martin >> >>> >>> Best, >>> Rob >>> >>> On 6/15/13 6:45 AM, Martin Morgan wrote: >>>> On 06/14/2013 07:57 PM, Robert K.Bradley [guest] wrote: >>>>> >>>>> Hello, >>>>> >>>>> I'm interested in using the pairwiseAlignment function in Biostrings >>>>> to perform simple alignments of short sequences. I noticed that even >>>>> the global alignment mode returns an alignment with gapped ends >>>>> trimmed off. For example: >>>>> >>>>>> s1 = DNAString ("AAGGAA") >>>>>> s2 = DNAString ("GG") >>>>>> pairwiseAlignment (s1, s2, type = "global") >>>>> Global PairwiseAlignmentsSingleSubject (1 of 1) >>>>> pattern: [3] GG >>>>> subject: [1] GG >>>>> score: -32.03649 >>>>> >>>>> Is it possible to obtain the complete alignment including the >>>>> (possibly) gapped ends? In this case, that would be: >>>>> --GG-- >>>>> AAGGAA >>>>> For my purposes, having the complete gapped alignment is important. >>>> >>>> Hi Robert -- Not an expert on pairwiseAlignment, but I think the return >>>> value is a class that contains more information than it is showing >>>> >>>> > xx = pairwiseAlignment(s2, s1, type="global") >>>> > class(xx) >>>> [1] "PairwiseAlignmentsSingleSubject" >>>> attr(,"package") >>>> [1] "Biostrings" >>>> > class?PairwiseAlignmentsSingleSubject >>>> > aligned(xx) >>>> A DNAStringSet instance of length 1 >>>> width seq >>>> [1] 6 --GG-- >>>> >>>> Martin >>>> >>>>> >>>>> My question seems related to a previous post: >>>>> https://stat.ethz.ch/pipermail/bioconductor/2012-February/043577.html >>>>> >>>>> Thank you in advance. >>>>> >>>>> Best wishes, >>>>> Rob >>>>> >>>> >>>> >> >> >
ADD REPLYlink written 6.5 years ago by Robert K. Bradley40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 196 users visited in the last hour