plotting proteases digestion points in a protein
Entering edit mode
Assa Yeroslaviz ★ 1.5k
Last seen 3 months ago
Munich, Germany

Hi all,

I'm working now on a method to try and plot a protein sequence after digested with one (or more) restriction enzyme(s).

the goal is to use a specific cleavage points and cut the protein sequence into snippets. After filtering the too long and too shots snippets, I would like to plot the remained peptides and get a table of the coverage.

Below I have attached a part of code for one pattern (trypsin cleaves the protein after a K) as well as the image i would like to get. 

Before I reinvent the wheel, I was wondering if there is already some other method of doing it and if so, please let me know. 




protein1 <- readAAStringSet(file="Protein1.fasta")
protein1 <- toString(protein1)
hits <- matchPattern("K", protein1)
subseqs <- IRanges(PartitioningByEnd(end(hits)))
subseqs.filt <- subseqs[width(subseqs) >=6 & width(subseqs) <= 30 ]

AAcovered <- sum($width)
AApeptides<-  length($width)
write.table(x =, file = "Protein1.txt", sep="\t", quote = F )
paste0("This gives a coverage ratio of ", format(AAcovered * 100 / hits@subject@length, trim = FALSE, digits=4 ), "%" )



biostrings iranges sequence alignment protein • 604 views
Entering edit mode
Last seen 6 weeks ago
United States

You should look into the Gviz package for plotting, instead of using the simple function from the IRanges vignette.

For tabulating the coverage, it's not exactly clear what you mean, but your current approach seems basically fine. I'm curious though as to why you used the width() accessor in the first part of the code, and then switched to$width later, and then resorted to direct slot access. Or why you converted the AAStringSet to a character vector before matching, which would just turn it back into an XStringSet.

Entering edit mode

thanks for the Gviz recommendation. I'll look into it. 

For your comments, I'll try to answer the second remark first.

I'm converting the AAStringSet into String, as i can't seem to be able to use matchPattern on a StringSet. Is there a better way of doing it? As for the first remark, because i am converting it to a string, i need to get the length of it back again. this is though only saved deep down in the hit object, which is why i am calling it like that.

I have though another problem. This is the end of my protein sequence : ...LQGRRRQPRGKKKVERN

because I am using the PartitioningByEnd overlap I miss the last four amino acids in my IRanges object.

> hits
  Views on a 1834-letter BString subject
     start  end width
 [1]     2    2     1 [K]
 [2]    39   39     1 [K]
 [3]    52   52     1 [K]
 [4]    63   63     1 [K]
 [5]    64   64     1 [K]
 ...   ...  ...   ... ...
[86]  1769 1769     1 [K]
[87]  1817 1817     1 [K]
[88]  1828 1828     1 [K]
[89]  1829 1829     1 [K]
[90]  1830 1830     1 [K]

> subseqs
IRanges object with 90 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]         1         2         2
   [2]         3        39        37
   [3]        40        52        13
   [4]        53        63        11
   [5]        64        64         1
   ...       ...       ...       ...
  [86]      1764      1769         6
  [87]      1770      1817        48
  [88]      1818      1828        11
  [89]      1829      1829         1
  [90]      1830      1830         1

Is there a way to paste the last piece of the protein (VERN) back to my Ranges object?

so that it has 91 row and the last is

  [91]      1831      1834         4




Entering edit mode

The AAStringSet consists of individual AAString objects. You can pull one out with protein1[[1]] and call matchPattern() on that. The width() accessor will get you the length of the string; no need to dig down deep into the object. I think the last issue is now answered by another thread.

Entering edit mode

thanks for that. I think this covers that question. As for the second part, I still have some other problem, when the pattern in question should be the beginning and not the end pf the snippet looked for (A: convert a sequence to Ranges object).


Login before adding your answer.

Traffic: 354 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6