Search
Question: pairwiseAlignments coordinate mapping?
0
2.2 years ago by
Denali
Steve Lianoglou12k wrote:

Sorry if I'm missing the obvious, but after I perform a pairwise alignment between two strings, I'd like to query positions in the pattern to learn what position it aligns to in the subject -- is this functionality already there and I'm missing it, or do I need to rig up a utility function?

For instance, suppose we have this example:

library(Biostrings)
(pw <- pairwiseAlignment('GTCA', 'GATACA'))
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] GT--CA
subject: [1] GATACA
score: -17.95402 

I'd like to know that the 3rd (C) letter in pattern aligned to the 5th position in subject.

I can rig up something very basic (and probably breaks around many corner cases) that parses pattern(pw) (GT--CA) to count the dashes or whatever to translate one index to the other, but is there something more straightforward (and better thought out) already implemented that I'm missing?

modified 2.2 years ago by Hervé Pagès ♦♦ 13k • written 2.2 years ago by Steve Lianoglou12k
3
2.2 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Steve,

I'm not aware of such functionality. The Pairwise Sequence Alignments vignette is pretty extensive and if this functionality is not mentioned there then that probably means it's not available. There is one gotcha with the parsing and counting dashes solution: the aligned pattern and/or subject returned by pattern(pw) and/or subject(pw) can be trimmed (on one or both sides):

pairwiseAlignment('CTTTTT', 'TTTTT')
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [2] TTTTT
# subject: [1] TTTTT
# score: -9.091025 

Here is a solution that leverages internal helper Biostrings:::.pre2postaligned() used by writePairwiseAlignments():

mappingDetails <- function(x)
{
if (!(is(x, "PairwiseAlignments") && length(x) == 1L))
stop("'x' must be a PairwiseAlignments object of length 1")
pos_in_pattern <- as.integer(x@pattern@range)
p2a <- data.frame(
pos_in_pattern=pos_in_pattern,
pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_pattern, x@pattern)
)
pos_in_subject <- as.integer(x@subject@range)
s2a <- data.frame(
pos_in_subject=pos_in_subject,
pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_subject, x@subject)
)
df <- merge(p2a, s2a, all=TRUE)
df[ , c("pos_in_pattern", "pos_in_pwa", "pos_in_subject")]
}

(pw <- pairwiseAlignment('GTCA', 'GATACA'))
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [1] GT--CA
# subject: [1] GATACA
# score: -17.95402

mappingDetails(pw)
#   pos_in_pattern pos_in_pwa pos_in_subject
# 1              1          1              1
# 2              2          2              2
# 3             NA          3              3
# 4             NA          4              4
# 5              3          5              5
# 6              4          6              6

The middle column (pos_in_pwa) is the position in the aligned strings.

Cheers,

H.