Search
Question: "Wobble" patterns for genome searching using Biostrings
0
3.2 years ago by
fongchunchan30
fongchunchan30 wrote:

Hi,

I've been going through the "Efficient genome searching with Biostrings and the BSgenome data packages" pdf document to get a grasp on how to search for a particular motif in a genome sequence.

Specifically, I am interesting in looking for the RGYW motif. Which actually has a "wobble" (I think this is the correct term) in the positions 1, 3, and 4. So basically, the R position can be either A/G. The Y position can be either C/T, and the W position can be either A/T.

For specifying the pattern, it seems that it has to be an explicit pattern (i.e. AGCA) for a Biostring object. Is there anyway to actually specify the pattern such that a given position could have multiple values? Something like A/GGC/TA/T. The other solution would be to explicit list out all of the patterns (i.e. AGCA, GGCA, etc) to do the searching. But if is there a way to do this "wobble" pattern, then it would be save some time, especially if it is a long pattern.

Fong

modified 3.2 years ago • written 3.2 years ago by fongchunchan30
2
3.2 years ago by
Denali
Steve Lianoglou12k wrote:

In the help page from ?Biostrings::matchPattern, under the description for the fixed parameter, I see:

If TRUE (the default), an IUPAC ambiguity code in the pattern can only match the same code in the subject, and vice versa. If FALSE, an IUPAC ambiguity code in the pattern can match any letter in the subject that is associated with the code, and vice versa. See ?lowlevel-matching for more information.

This should get you what you're after, no?