Search
Question: pairwiseAlignments coordinate mapping?
0
gravatar for Steve Lianoglou
18 months ago by
Genentech
Steve Lianoglou12k wrote:

Sorry if I'm missing the obvious, but after I perform a pairwise alignment between two strings, I'd like to query positions in the pattern to learn what position it aligns to in the subject -- is this functionality already there and I'm missing it, or do I need to rig up a utility function?

For instance, suppose we have this example:

library(Biostrings)
(pw <- pairwiseAlignment('GTCA', 'GATACA'))
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] GT--CA 
subject: [1] GATACA 
score: -17.95402 

I'd like to know that the 3rd (C) letter in pattern aligned to the 5th position in subject.

I can rig up something very basic (and probably breaks around many corner cases) that parses pattern(pw) (GT--CA) to count the dashes or whatever to translate one index to the other, but is there something more straightforward (and better thought out) already implemented that I'm missing?

 

ADD COMMENTlink modified 18 months ago by Hervé Pagès ♦♦ 13k • written 18 months ago by Steve Lianoglou12k
3
gravatar for Hervé Pagès
18 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Steve,

I'm not aware of such functionality. The Pairwise Sequence Alignments vignette is pretty extensive and if this functionality is not mentioned there then that probably means it's not available. There is one gotcha with the parsing and counting dashes solution: the aligned pattern and/or subject returned by pattern(pw) and/or subject(pw) can be trimmed (on one or both sides):

pairwiseAlignment('CTTTTT', 'TTTTT')
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [2] TTTTT 
# subject: [1] TTTTT 
# score: -9.091025 

Here is a solution that leverages internal helper Biostrings:::.pre2postaligned() used by writePairwiseAlignments():

mappingDetails <- function(x)
{
    if (!(is(x, "PairwiseAlignments") && length(x) == 1L))
        stop("'x' must be a PairwiseAlignments object of length 1")
    pos_in_pattern <- as.integer(x@pattern@range)
    p2a <- data.frame(
        pos_in_pattern=pos_in_pattern,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_pattern, x@pattern)
    )
    pos_in_subject <- as.integer(x@subject@range)
    s2a <- data.frame(
        pos_in_subject=pos_in_subject,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_subject, x@subject)
    )
    df <- merge(p2a, s2a, all=TRUE)
    df[ , c("pos_in_pattern", "pos_in_pwa", "pos_in_subject")]
}

For your example:

(pw <- pairwiseAlignment('GTCA', 'GATACA'))
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [1] GT--CA 
# subject: [1] GATACA 
# score: -17.95402 

mappingDetails(pw)
#   pos_in_pattern pos_in_pwa pos_in_subject
# 1              1          1              1
# 2              2          2              2
# 3             NA          3              3
# 4             NA          4              4
# 5              3          5              5
# 6              4          6              6

The middle column (pos_in_pwa) is the position in the aligned strings.

Cheers,

H.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Hervé Pagès ♦♦ 13k

You are a hero. Thanks Hervé!

ADD REPLYlink written 18 months ago by Steve Lianoglou12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 165 users visited in the last hour