findComplementedPalindromes function is not finding all complementary palindromes
1
0
Entering edit mode
ggarzaz39 • 0
@ggarzaz39-7055
Last seen 10.1 years ago
United States

I noticed that the findComplementedPalindromes function does not consistently find all complementary palindromes.  For example, in the code below, both strings contain complementary palindromes that are 5 sequences long.  In fact, because the second string is the reverse of the first string it contains the same complementary palindromes as the first only reversed.

library(Biostrings) 

seq1 = DNAString("TTTAA")
findComplementedPalindromes(seq1, min.armlength=2)

seq2 = DNAString("AATTT")
findComplementedPalindromes(seq2, min.armlength=2)

However, this is the output I get:

> seq1 = DNAString("TTTAA")
> findComplementedPalindromes(seq1, min.armlength=2)
  Views on a 5-letter DNAString subject
subject: TTTAA
views:
    start end width
[1]     1   5     5 [TTTAA]
[2]     2   5     4 [TTAA]
>
> seq2 = DNAString("AATTT")
> findComplementedPalindromes(seq2, min.armlength=2)
  Views on a 5-letter DNAString subject
subject: AATTT
views:
    start end width
[1]     1   4     4 [AATT]

The output does not show that the second string contains a 5 sequence long complementary palindrome like the output for first string does.  Is this a bug or expected behavior?

The output of sessionInfo() is:

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin14.0.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] Biostrings_2.34.0   XVector_0.6.0       IRanges_2.0.0
[4] S4Vectors_0.4.0     BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.12.0

Biostrings • 1.6k views
ADD COMMENT
1
Entering edit mode

Hi ggarzaz39,

Sorry for the late answer. You found a bug. Thanks for reporting! Will get back to you when it's fixed.

H.

ADD REPLY
2
Entering edit mode
@herve-pages-1542
Last seen 4 hours ago
Seattle, WA, United States

Hi ggarzaz39,

This is fixed in Biostrings 2.34.1 (release) and 2.35.6 (devel). Both packages should become available in 20 hours or so via biocLite(). 2 other issues were also addressed. Here is the summary:

  1. All palindromic regions are now returned (previously palindromic regions contained in an other palindromic regions would sometimes be mistakenly discarded from the result).
  2. IUPAC ambiguity codes are not allowed in the arms of a palindromic region anymore.
  3. The arm length returned by palindromeArmLength() or complementedPalindromeArmLength() cannot be more than half the length of the palindromic sequence anymore. Used to be the case that, for a perfect palindromic sequence (i.e. a sequence identical to its reverse or reverse complement), the returned arm length was the length of the entire sequence.

I'm also working on adding support for max.mismatch != 0 (promised a long time ago and requested recently by someone). This will only become available in BioC devel (Biostrings >= 2.35.7). Since we are on it I'm also tempted to simplify the terminology and to get rid of the findComplementedPalindromes() function to only keep findPalindromes()findPalindromes() would actually behave like findComplementedPalindromes() on DNA and RNA sequences. If someone really wants to find classic text palindromes in DNA and RNA, s/he would need to coerce the input to BString first. But why would anybody want to do this since it has no biological meaning right? So overall getting rid of all the *Complemented*() functions would reduce their number by half and be much less confusing. Do you agree?

Thanks,

H.

 

ADD COMMENT
0
Entering edit mode

Thanks. I did wonder if anyone used the findPalindromes() function. My guess is that some people's code will be broken if you get rid of the *Complemented*() functions. It wouldn't be great for reproducibility.

ADD REPLY
0
Entering edit mode

When I say "get rid" I actually mean "deprecate" so people can still use the function but they get a warning that they need to start using findPalindromes() instead. This will make the transition as smooth as it can be. Reproducibility means that people can re-run an analysis anytime using the same package versions that were originally used. It doesn't mean that the software cannot evolve.

So I went ahead and did this in Biostrings 2.35.7. I also added support for max.mismatch != 0.

Cheers,

H.

 

ADD REPLY

Login before adding your answer.

Traffic: 530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6