I am searching the genome for occurrences of a very small motif:
matches <- vmatchPattern(pattern = "TCAG", subject = BSgenome.Mmusculus.UCSC.mm10, exclude = c("M", "_"))
Because the motif is it's own reverse complement the vmatchPattern function returns matches on the positive and negative strand:
GRanges object with 13339240 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [3000191, 3000194] +
[2] chr1 [3000813, 3000816] +
[3] chr1 [3001048, 3001051] +
[4] chr1 [3001119, 3001122] +
[5] chr1 [3001795, 3001798] +
... ... ... ...
[13339236] chrY [90843570, 90843573] -
[13339237] chrY [90844087, 90844090] -
[13339238] chrY [90844334, 90844337] -
[13339239] chrY [90844496, 90844499] -
[13339240] chrY [90844695, 90844698] -
I would like instead for it to return all matches on either the positive or negative strand. From there I can then just reset the strand information to ambiguous:
strand(matches) <- "*"
Is there an option within vmatchPattern to return matches from a single strand? Similar to matchPattern which returns the result with ambiguous strand information. Granted, I could just filter out all minus or plus strand ranges:
matches <- matches[strand(matches) == "+"]
I just wondered if there was a way within the vmatchPattern function itself which I was missing?
