Extracting the findPalindrome() results
1
0
Entering edit mode
meyerlaker • 0
@3a6449cf
Last seen 6 weeks ago
Austria

Hello!

I am trying to findPalindromes in a whole gene sequence and to save them to then look for overlaps of these with mismatched nucleotides between two organisms for designing specific strain qPCR probes and primers. I've tried saving them as a data frame from findPalindrome() and exporting via export.fasta() from bios2mds package but the data frame only has one column with the palindromes and not the locations. Can you help me?

I am new to programming and bioinformatics so sorry if its a dumb question or an obvious answer ;-)

All the best! Vicki

findPalindrome() Palindrome • 135 views
1
Entering edit mode
@herve-pages-1542
Last seen 7 hours ago
Seattle, WA, United States

Hi,

I don't know anything about the bios2mds package (doesn't seem to be a Bioconductor package). Note that you don't need to turn the output of Biostrings::findPalindromes() into a data.frame, this could be very inefficient. Instead, turn it into a DNAStringSet object, add the ranges as the names of this object, and write the object to the FASTA file with Biostrings::fwriteXStringSet(). Should look something like this:

library(Biostrings)

...

pals <- findPalindromes(...)

sequences <- as(pals, "DNAStringSet")
names(sequences) <- as.character(as(pals, "IRanges"))
writeXStringSet(sequences, "path/to/file.fa")


Hope this helps.

H.

0
Entering edit mode

Also, you have a point: the output of Biostrings::findPalindromes() is an XStringViews object and as.data.frame() does a poor job on these objects:

pals
# Views on a 34-letter BString subject
# subject: abbbaabbcbbaccacabbbccbcaabbabacca
# views:
#       start end width
#   [1]     3   8     6 [bbaabb]
#   [2]     6  12     7 [abbcbba]
#   [3]    10  19    10 [bbaccacabb]

as.data.frame(pals)
#            x
# 1     bbaabb
# 2    abbcbba
# 3 bbaccacabb


I've just changed this in the devel version of Biostrings. Now it does:

as.data.frame(pals)
#   start end width        seq
# 1     3   8     6     bbaabb
# 2     6  12     7    abbcbba
# 3    10  19    10 bbaccacabb


This is with Biostrings 2.59.3 (part of BioC 3.13, not released yet).

Best,

H.

0
Entering edit mode

This worked great. Thanks a lot!! And cool to see the change in the new version :-)