Question: Return features from a single strand using vmatchPattern
gravatar for jma1991
23 months ago by
jma199130 wrote:

I am searching the genome for occurrences of a very small motif:

matches <- vmatchPattern(pattern = "TCAG", subject = BSgenome.Mmusculus.UCSC.mm10, exclude = c("M", "_"))

Because the motif is it's own reverse complement the vmatchPattern function returns matches on the positive and negative strand:

GRanges object with 13339240 ranges and 0 metadata columns:
             seqnames               ranges strand
                <Rle>            <IRanges>  <Rle>
         [1]     chr1   [3000191, 3000194]      +
         [2]     chr1   [3000813, 3000816]      +
         [3]     chr1   [3001048, 3001051]      +
         [4]     chr1   [3001119, 3001122]      +
         [5]     chr1   [3001795, 3001798]      +
         ...      ...                  ...    ...
  [13339236]     chrY [90843570, 90843573]      -
  [13339237]     chrY [90844087, 90844090]      -
  [13339238]     chrY [90844334, 90844337]      -
  [13339239]     chrY [90844496, 90844499]      -
  [13339240]     chrY [90844695, 90844698]      -

I would like instead for it to return all matches on either the positive or negative strand. From there I can then just reset the strand information to ambiguous:

strand(matches) <- "*"

Is there an option within vmatchPattern to return matches from a single strand? Similar to matchPattern which returns the result with ambiguous strand information. Granted, I could just filter out all minus or plus strand ranges:

matches <- matches[strand(matches) == "+"]

I just wondered if there was a way within the vmatchPattern function itself which I was missing?


ADD COMMENTlink modified 23 months ago by Hervé Pagès ♦♦ 13k • written 23 months ago by jma199130
gravatar for Hervé Pagès
23 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:


Note that TCAG is not a palindrome:

> motif <- DNAString("TCAG")
> reverseComplement(motif)
  4-letter "DNAString" instance
seq: CTGA


Anyway, to answer your question: no, vmatchPattern() doesn't have an option for searching only one strand of the genome. Filtering out the matches that belong to a given strand like you did is probably the best/easiest  way to work around this.



ADD COMMENTlink written 23 months ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 347 users visited in the last hour