Entering edit mode
Hi,
I would like to find the number of disjunct occurrences of a motif set (e.g. 3 motifs) in a given StringSet. This basically means the sequences shall be walked from left to right and looked up for matches of the motif set. If one match is identified, the next putative match can only be adjacent, not overlapping.
I thought of using multiple times gregexpr and then parse the output of gregexpr, but it seems pretty complicated to me. Does somebody know an easier way to do this?
library(Biostrings)
data(yeastSEQCHR1)
yeast1 <- DNAString(yeastSEQCHR1)
x = Views(yeast1, start = sample(length(yeast1),20), width=20)
motif_set = c("AAA", "ATT", "TTT")
# returns number of disjoint matches for a given pattern
gregexpr("AAA",x)
