matchPWM on DNAStringSet rather than just one sequence?
2
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 20 months ago
United States

I have a couple PWMs for RNA binding proteins and I also have the sequences of candidate UTRs in different groups as a DNAStringSet. I'd like to see how many of the UTRs (and which ones) in each group match a given PWM. However, it looks like the matchPWM function in Biostrings only supports a single sequence rather than a DNAStringSet. Is there a way to do this besides sticking all of my sequences together, matching, breaking them apart or looping through each sequence?

Thanks

biostrings • 1.2k views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 7 hours ago
EMBL Heidelberg

I'm pretty sure you can use sapply for this, something along the lines of:  

sapply(my_DNAStringSet, FUN = matchPWM, pwm = my_PWM)
ADD COMMENT
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 20 months ago
United States

That works, but is incredibly slow once I start looping through all of my UTRs and even a few RNA binding proteins. Is there another bioconductor package or program outside that would be significantly faster?

ADD COMMENT
0
Entering edit mode

Hi Jake,

Have you tried the "sticking all of my sequences together, matching, breaking them apart" approach? It should be significantly faster than looping.

H.

ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6