Given that vmatchPattern doesn't work with a vector as the pattern and vmatchPDict isn't implemented, are there alternatives in R that don't involve using short read mapping algorithms and building indexes?
Maybe you can trie AhoCorasickTrie ; would be good to know if this works for your purposes.
Thanks for the suggestion. However, AhoCorasickSearch currently doesn't support mismatches nor indels. I doubt that it would be useful for many genomics applications. I notice that my question is basically the same as matching of AAStringSet vs. another AAStringSet. It might be a common use case worth an optimised solution in Biostrings.
Good to know about the limitations. Another possibility is to 'unlist' one of the StringSets into in to a single *String separated by nonsense (e.g., poly-N), match, then relist the result as appropriate.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy