Search
Question: matchPWM on DNAStringSet rather than just one sequence?
0
gravatar for Jake
18 months ago by
Jake50
United States
Jake50 wrote:

I have a couple PWMs for RNA binding proteins and I also have the sequences of candidate UTRs in different groups as a DNAStringSet. I'd like to see how many of the UTRs (and which ones) in each group match a given PWM. However, it looks like the matchPWM function in Biostrings only supports a single sequence rather than a DNAStringSet. Is there a way to do this besides sticking all of my sequences together, matching, breaking them apart or looping through each sequence?

Thanks

ADD COMMENTlink modified 18 months ago • written 18 months ago by Jake50
0
gravatar for Mike Smith
18 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

I'm pretty sure you can use sapply for this, something along the lines of:  

sapply(my_DNAStringSet, FUN = matchPWM, pwm = my_PWM)
ADD COMMENTlink written 18 months ago by Mike Smith2.1k
0
gravatar for Jake
18 months ago by
Jake50
United States
Jake50 wrote:

That works, but is incredibly slow once I start looping through all of my UTRs and even a few RNA binding proteins. Is there another bioconductor package or program outside that would be significantly faster?

ADD COMMENTlink written 18 months ago by Jake50

Hi Jake,

Have you tried the "sticking all of my sequences together, matching, breaking them apart" approach? It should be significantly faster than looping.

H.

ADD REPLYlink written 18 months ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 160 users visited in the last hour