Search
Question: matchPWM on DNAStringSet rather than just one sequence?
0
2.4 years ago by
Jake60
United States
Jake60 wrote:

I have a couple PWMs for RNA binding proteins and I also have the sequences of candidate UTRs in different groups as a DNAStringSet. I'd like to see how many of the UTRs (and which ones) in each group match a given PWM. However, it looks like the matchPWM function in Biostrings only supports a single sequence rather than a DNAStringSet. Is there a way to do this besides sticking all of my sequences together, matching, breaking them apart or looping through each sequence?

Thanks

modified 2.4 years ago • written 2.4 years ago by Jake60
0
2.4 years ago by
Mike Smith3.1k
EMBL Heidelberg / de.NBI
Mike Smith3.1k wrote:

I'm pretty sure you can use sapply for this, something along the lines of:

sapply(my_DNAStringSet, FUN = matchPWM, pwm = my_PWM)
0
2.4 years ago by
Jake60
United States
Jake60 wrote:

That works, but is incredibly slow once I start looping through all of my UTRs and even a few RNA binding proteins. Is there another bioconductor package or program outside that would be significantly faster?

Hi Jake,

Have you tried the "sticking all of my sequences together, matching, breaking them apart" approach? It should be significantly faster than looping.

H.