Question

pairwiseAlignments coordinate mapping?

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 12 weeks ago

United States

Sorry if I'm missing the obvious, but after I perform a pairwise alignment between two strings, I'd like to query positions in the pattern to learn what position it aligns to in the subject -- is this functionality already there and I'm missing it, or do I need to rig up a utility function?

For instance, suppose we have this example:

library(Biostrings)
(pw <- pairwiseAlignment('GTCA', 'GATACA'))
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] GT--CA 
subject: [1] GATACA 
score: -17.95402

I'd like to know that the 3rd (C) letter in pattern aligned to the 5th position in subject.

I can rig up something very basic (and probably breaks around many corner cases) that parses pattern(pw) (GT--CA) to count the dashes or whatever to translate one index to the other, but is there something more straightforward (and better thought out) already implemented that I'm missing?

biostrings pairwisealignment • 2.2k views

ADD COMMENT • link updated 8 weeks ago by rohitsatyam102 ▴ 20 • written 9.1 years ago by Steve Lianoglou ★ 13k

score 3 · Accepted Answer · 2016-05-19

Hi Steve,

I'm not aware of such functionality. The Pairwise Sequence Alignments vignette is pretty extensive and if this functionality is not mentioned there then that probably means it's not available. There is one gotcha with the parsing and counting dashes solution: the aligned pattern and/or subject returned by pattern(pw) and/or subject(pw) can be trimmed (on one or both sides):

pairwiseAlignment('CTTTTT', 'TTTTT')
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [2] TTTTT 
# subject: [1] TTTTT 
# score: -9.091025

Here is a solution that leverages internal helper Biostrings:::.pre2postaligned() used by writePairwiseAlignments():

mappingDetails <- function(x)
{
    if (!(is(x, "PairwiseAlignments") && length(x) == 1L))
        stop("'x' must be a PairwiseAlignments object of length 1")
    pos_in_pattern <- as.integer(x@pattern@range)
    p2a <- data.frame(
        pos_in_pattern=pos_in_pattern,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_pattern, x@pattern)
    )
    pos_in_subject <- as.integer(x@subject@range)
    s2a <- data.frame(
        pos_in_subject=pos_in_subject,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_subject, x@subject)
    )
    df <- merge(p2a, s2a, all=TRUE)
    df[ , c("pos_in_pattern", "pos_in_pwa", "pos_in_subject")]
}

For your example:

(pw <- pairwiseAlignment('GTCA', 'GATACA'))
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [1] GT--CA 
# subject: [1] GATACA 
# score: -17.95402 

mappingDetails(pw)
#   pos_in_pattern pos_in_pwa pos_in_subject
# 1              1          1              1
# 2              2          2              2
# 3             NA          3              3
# 4             NA          4              4
# 5              3          5              5
# 6              4          6              6

The middle column (pos_in_pwa) is the position in the aligned strings.

Cheers,

H.