Return features from a single strand using vmatchPattern
1
0
Entering edit mode
jma1991 ▴ 70
@jma1991-11856
Last seen 12 months ago
Cumbernauld

I am searching the genome for occurrences of a very small motif:

matches <- vmatchPattern(pattern = "TCAG", subject = BSgenome.Mmusculus.UCSC.mm10, exclude = c("M", "_"))

Because the motif is it's own reverse complement the vmatchPattern function returns matches on the positive and negative strand:

GRanges object with 13339240 ranges and 0 metadata columns:
             seqnames               ranges strand
                <Rle>            <IRanges>  <Rle>
         [1]     chr1   [3000191, 3000194]      +
         [2]     chr1   [3000813, 3000816]      +
         [3]     chr1   [3001048, 3001051]      +
         [4]     chr1   [3001119, 3001122]      +
         [5]     chr1   [3001795, 3001798]      +
         ...      ...                  ...    ...
  [13339236]     chrY [90843570, 90843573]      -
  [13339237]     chrY [90844087, 90844090]      -
  [13339238]     chrY [90844334, 90844337]      -
  [13339239]     chrY [90844496, 90844499]      -
  [13339240]     chrY [90844695, 90844698]      -

I would like instead for it to return all matches on either the positive or negative strand. From there I can then just reset the strand information to ambiguous:

strand(matches) <- "*"

Is there an option within vmatchPattern to return matches from a single strand? Similar to matchPattern which returns the result with ambiguous strand information. Granted, I could just filter out all minus or plus strand ranges:

matches <- matches[strand(matches) == "+"]

I just wondered if there was a way within the vmatchPattern function itself which I was missing?

 

biostrings genomicranges • 1.0k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 23 hours ago
Seattle, WA, United States

Hi,

Note that TCAG is not a palindrome:

> motif <- DNAString("TCAG")
> reverseComplement(motif)
  4-letter "DNAString" instance
seq: CTGA

 

Anyway, to answer your question: no, vmatchPattern() doesn't have an option for searching only one strand of the genome. Filtering out the matches that belong to a given strand like you did is probably the best/easiest  way to work around this.

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6