pairwiseAlignments coordinate mapping?
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 12 weeks ago
United States

Sorry if I'm missing the obvious, but after I perform a pairwise alignment between two strings, I'd like to query positions in the pattern to learn what position it aligns to in the subject -- is this functionality already there and I'm missing it, or do I need to rig up a utility function?

For instance, suppose we have this example:

library(Biostrings)
(pw <- pairwiseAlignment('GTCA', 'GATACA'))
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] GT--CA 
subject: [1] GATACA 
score: -17.95402 

I'd like to know that the 3rd (C) letter in pattern aligned to the 5th position in subject.

I can rig up something very basic (and probably breaks around many corner cases) that parses pattern(pw) (GT--CA) to count the dashes or whatever to translate one index to the other, but is there something more straightforward (and better thought out) already implemented that I'm missing?

 

biostrings pairwisealignment • 2.2k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 5 days ago
Seattle, WA, United States

Hi Steve,

I'm not aware of such functionality. The Pairwise Sequence Alignments vignette is pretty extensive and if this functionality is not mentioned there then that probably means it's not available. There is one gotcha with the parsing and counting dashes solution: the aligned pattern and/or subject returned by pattern(pw) and/or subject(pw) can be trimmed (on one or both sides):

pairwiseAlignment('CTTTTT', 'TTTTT')
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [2] TTTTT 
# subject: [1] TTTTT 
# score: -9.091025 

Here is a solution that leverages internal helper Biostrings:::.pre2postaligned() used by writePairwiseAlignments():

mappingDetails <- function(x)
{
    if (!(is(x, "PairwiseAlignments") && length(x) == 1L))
        stop("'x' must be a PairwiseAlignments object of length 1")
    pos_in_pattern <- as.integer(x@pattern@range)
    p2a <- data.frame(
        pos_in_pattern=pos_in_pattern,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_pattern, x@pattern)
    )
    pos_in_subject <- as.integer(x@subject@range)
    s2a <- data.frame(
        pos_in_subject=pos_in_subject,
        pos_in_pwa=Biostrings:::.pre2postaligned(pos_in_subject, x@subject)
    )
    df <- merge(p2a, s2a, all=TRUE)
    df[ , c("pos_in_pattern", "pos_in_pwa", "pos_in_subject")]
}

For your example:

(pw <- pairwiseAlignment('GTCA', 'GATACA'))
# Global PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [1] GT--CA 
# subject: [1] GATACA 
# score: -17.95402 

mappingDetails(pw)
#   pos_in_pattern pos_in_pwa pos_in_subject
# 1              1          1              1
# 2              2          2              2
# 3             NA          3              3
# 4             NA          4              4
# 5              3          5              5
# 6              4          6              6

The middle column (pos_in_pwa) is the position in the aligned strings.

Cheers,

H.

ADD COMMENT
0
Entering edit mode

You are a hero. Thanks Hervé!

ADD REPLY
0
Entering edit mode

I was facing a similar issue. Turns out that the coordinates of the subject where the query sequences aligns exact can be accessed as follows:

query<- readAAStringSet("tt.fa") ## Multifasta file
reference<- readAAStringSet("poly.fasta")
pwg <- pwalign::pairwiseAlignment(aas,poly,type="local")
pwg@subject@range
IRanges object with 27 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]         5       114       110
   [2]      1493      1642       150
   [3]      1661      1806       146
   [4]      1808      1952       145
   [5]      2099      2242       144
   ...       ...       ...       ...
  [23]       776      1127       352
  [24]      1128      1345       218
  [25]      1346      1475       130
  [26]      1476      2094       619
  [27]      2095      2244       150
ADD REPLY

Login before adding your answer.

Traffic: 1158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6