Search
Question: matchPWM on DNAStringSet rather than just one sequence?
0
23 months ago by
Jake60
United States
Jake60 wrote:

I have a couple PWMs for RNA binding proteins and I also have the sequences of candidate UTRs in different groups as a DNAStringSet. I'd like to see how many of the UTRs (and which ones) in each group match a given PWM. However, it looks like the matchPWM function in Biostrings only supports a single sequence rather than a DNAStringSet. Is there a way to do this besides sticking all of my sequences together, matching, breaking them apart or looping through each sequence?

Thanks

modified 23 months ago • written 23 months ago by Jake60
0
23 months ago by
Mike Smith2.7k
EMBL Heidelberg / de.NBI
Mike Smith2.7k wrote:

I'm pretty sure you can use sapply for this, something along the lines of:

sapply(my_DNAStringSet, FUN = matchPWM, pwm = my_PWM)
0
23 months ago by
Jake60
United States
Jake60 wrote:

That works, but is incredibly slow once I start looping through all of my UTRs and even a few RNA binding proteins. Is there another bioconductor package or program outside that would be significantly faster?

Hi Jake,

Have you tried the "sticking all of my sequences together, matching, breaking them apart" approach? It should be significantly faster than looping.

H.