Extracting the findPalindrome() results
1
0
Entering edit mode
meyerlaker • 0
@3a6449cf
Last seen 3.6 years ago
Austria

Hello!

I am trying to findPalindromes in a whole gene sequence and to save them to then look for overlaps of these with mismatched nucleotides between two organisms for designing specific strain qPCR probes and primers. I've tried saving them as a data frame from findPalindrome() and exporting via export.fasta() from bios2mds package but the data frame only has one column with the palindromes and not the locations. Can you help me?

I am new to programming and bioinformatics so sorry if its a dumb question or an obvious answer ;-)

All the best! Vicki

findPalindrome() Palindrome • 967 views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 22 hours ago
Seattle, WA, United States

Hi,

I don't know anything about the bios2mds package (doesn't seem to be a Bioconductor package). Note that you don't need to turn the output of Biostrings::findPalindromes() into a data.frame, this could be very inefficient. Instead, turn it into a DNAStringSet object, add the ranges as the names of this object, and write the object to the FASTA file with Biostrings::fwriteXStringSet(). Should look something like this:

library(Biostrings)

...

pals <- findPalindromes(...)

sequences <- as(pals, "DNAStringSet")
names(sequences) <- as.character(as(pals, "IRanges"))
writeXStringSet(sequences, "path/to/file.fa")

Hope this helps.

H.

ADD COMMENT
0
Entering edit mode

Also, you have a point: the output of Biostrings::findPalindromes() is an XStringViews object and as.data.frame() does a poor job on these objects:

pals
# Views on a 34-letter BString subject
# subject: abbbaabbcbbaccacabbbccbcaabbabacca
# views:
#       start end width
#   [1]     3   8     6 [bbaabb]
#   [2]     6  12     7 [abbcbba]
#   [3]    10  19    10 [bbaccacabb]

as.data.frame(pals)
#            x
# 1     bbaabb
# 2    abbcbba
# 3 bbaccacabb

I've just changed this in the devel version of Biostrings. Now it does:

as.data.frame(pals)
#   start end width        seq
# 1     3   8     6     bbaabb
# 2     6  12     7    abbcbba
# 3    10  19    10 bbaccacabb

This is with Biostrings 2.59.3 (part of BioC 3.13, not released yet).

Best,

H.

ADD REPLY
0
Entering edit mode

This worked great. Thanks a lot!! And cool to see the change in the new version :-)

ADD REPLY

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6