pairwiseAlignment() - Similarity NA% ?
1
0
Entering edit mode
@borissteipe-7050
Last seen 7.8 years ago
Canada

pairwiseAlignment() returns NA values for similarity length and similarity %:

MWE:

p1 <- AAString(in2seq("VDVYEFIHSTGSIMKRKKDDWVNATHILK"))
q1 <- AAString(in2seq("IPVFEYPLNGQYIMIDCETGMVHFTGIWK"))
ali <- pairwiseAlignment(p1, q1,
                         substitutionMatrix = "BLOSUM62",
                         gapOpening = 1, gapExtension = 1)
writePairwiseAlignments(ali)

----

[...]
# Length: 34
# Identity: 10/34 (29.4%)
# Similarity:    NA/34 (NA%)
# Gaps:          10/34 (29.4%)
# Score: 36
[...]

(EMBOSS needle with the same parameters gives 48% similarity.)

Is it expected that pairwiseAlignment() produces NA in this case?

Thanks!

Boris

 

-----

R > sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.36.4    XVector_0.8.0        IRanges_2.2.9        S4Vectors_0.6.6     
[5] BiocGenerics_0.14.0  BiocInstaller_1.18.4 seqinr_3.1-3         ade4_1.7-2          
[9] stringr_1.0.0       

loaded via a namespace (and not attached):
[1] zlibbioc_1.14.0 magrittr_1.5    tools_3.2.2     stringi_0.5-5  
biostrings pairwisealignment • 1.4k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 13 hours ago
Seattle, WA, United States

Hi Boris,

writePairwiseAlignments() doesn't display the percent sequence similarity because there is no consensus on how this should be calculated. Note that it's the same situation for the percent sequence identity (see ?pid) even though writePairwiseAlignments() displays it.

See ?pid for the 4 different types of percent sequence identity that are supported. Also see selectMethod("pid", "PairwiseAlignments") for the implementation. As you can see the numerator is always 100*nmatch(x) (where x is the object returned by pairwiseAlignment()) and the denominator varies:

  1. PID1: nchar(x)
  2. PID2: nmatch(x) + nmismatch(x)
  3. PID3: pmin(nchar(unaligned(pattern(x))), nchar(unaligned(subject(x))))
  4. PID4: (nchar(unaligned(pattern(x))) + nchar(unaligned(subject(x)))) / 2

It should be easy to define your own similarity function along these lines. If none of these denominators is what you want, you can always use a denominator that that you feel is more appropriate to your use case e.g. nchar(unaligned(pattern(x)))' or 'nchar(unaligned(subject(x))).

Once you have a similarity function that returns the same as EMBOSS, I would gladly take it and add it to Biostrings.

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6