Question: extract codon position via BSgenome
0
gravatar for k.rajain1212
16 months ago by
k.rajain12120 wrote:

I have extracted a particular pattern position say TAT from a particular chromosome via BSgenome in R. Then I mapped my codon file with gtf file to check how many patterns are present in a particular gene. But what I found is, my codon number is crossing the length of gene. I am assuming why this happened is because 

Lets assume that the sequence is "ATATATATGCAT" and its taking start and end position like this:

start      end

2              4

4              6

6             8

can I avoid this? Here what I want is,  if once ant position is read it won't go back to trace the pattern.

 

bsgenome • 263 views
ADD COMMENTlink modified 15 months ago by Hervé Pagès ♦♦ 14k • written 16 months ago by k.rajain12120
Answer: extract codon position via BSgenome
0
gravatar for Hervé Pagès
15 months ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi,

In other words you want to skip matches that overlap with a previous match. Note that this is what regular expressions do: 

> gregexpr("TAT", "ATATATATGCAT")
[[1]]
[1] 2 6
attr(,"match.length")
[1] 3 3
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

But in your case it seems that you want to be even more restrictive by comparing codons only i.e. by comparing TAT with the codons in coding sequence ATATATATGCAT. Assuming that the phase of the coding sequence is 0, you can extract the set of codons with the codons() function from the Biostrings package:

> library(Biostrings)
> cds <- DNAString("ATATATATGCAT")
> codons <- codons(cds)
> codons
  Views on a 12-letter DNAString subject
subject: ATATATATGCAT
views:
    start end width
[1]     1   3     3 [ATA]
[2]     4   6     3 [TAT]
[3]     7   9     3 [ATG]
[4]    10  12     3 [CAT]

Then you can just compare this Views object with TAT:

> codons == DNAString("TAT")
[1] FALSE  TRUE FALSE FALSE
> which(codons == DNAString("TAT"))
[1] 2                                      # 2nd codon only is TAT codon
> start(codons)[codons == DNAString("TAT")]
[1] 4                                      # starts of TAT codons

Hope this helps,

H.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 91 users visited in the last hour