pairwiseAlignments coordinate mapping?
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States

Sorry if I'm missing the obvious, but after I perform a pairwise alignment between two strings, I'd like to query positions in the pattern to learn what position it aligns to in the subject -- is this functionality already there and I'm missing it, or do I need to rig up a utility function?

For instance, suppose we have this example:

library(Biostrings)
(pw <- pairwiseAlignment('GTCA', 'GATACA'))
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] GT--CA 
subject: [1] GATACA 
score: -17.95402 

I'd like to know that the 3rd (C) letter in pattern aligned to the 5th position in subject.

I can rig up something very basic (and probably breaks around many corner cases) that parses pattern(pw) (GT--CA) to count the dashes or whatever to translate one index to the other, but is there something more straightforward (and better thought out) already implemented that I'm missing?

 

biostrings pairwisealignment • 1.6k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 17 hours ago
Seattle, WA, United States

Hi Steve,

I'm not aware of such functionality. The Pairwise Sequence Alignments vignette is pretty extensive and if this functionality is not mentioned there then that probably means it's not available. There is one gotcha with the parsing and counting dashes solution: the aligned pattern and/or subject returned by pattern(pw) and/or subject(pw) can be trimmed (on one or both sides):

pairwiseAlignment('CTTTTT', 'TTTTT')
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [2] TTTTT 
# subject: [1] TTTTT 
# score: -9.091025 

Here is a solution that leverages internal helper Biostrings:::.pre2postaligned() used by writePairwiseAlignments():

mappingDetails <- function(x)
{
    if (!(is(x, "PairwiseAlignments") && length(x) == 1L))
        stop("'x' must be a PairwiseAlignments object of length 1")
    pos_in_pattern <- as.integer(x@pattern@range)
    p2a <- data.frame(
        pos_in_pattern=pos_in_pattern,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_pattern, x@pattern)
    )
    pos_in_subject <- as.integer(x@subject@range)
    s2a <- data.frame(
        pos_in_subject=pos_in_subject,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_subject, x@subject)
    )
    df <- merge(p2a, s2a, all=TRUE)
    df[ , c("pos_in_pattern", "pos_in_pwa", "pos_in_subject")]
}

For your example:

(pw <- pairwiseAlignment('GTCA', 'GATACA'))
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [1] GT--CA 
# subject: [1] GATACA 
# score: -17.95402 

mappingDetails(pw)
#   pos_in_pattern pos_in_pwa pos_in_subject
# 1              1          1              1
# 2              2          2              2
# 3             NA          3              3
# 4             NA          4              4
# 5              3          5              5
# 6              4          6              6

The middle column (pos_in_pwa) is the position in the aligned strings.

Cheers,

H.

ADD COMMENT
0
Entering edit mode

You are a hero. Thanks Hervé!

ADD REPLY

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6