Search
Question: plotting proteases digestion points in a protein
0
gravatar for Assa Yeroslaviz
9 months ago by
Assa Yeroslaviz1.3k
Munich, Germany
Assa Yeroslaviz1.3k wrote:

Hi all,

I'm working now on a method to try and plot a protein sequence after digested with one (or more) restriction enzyme(s).

the goal is to use a specific cleavage points and cut the protein sequence into snippets. After filtering the too long and too shots snippets, I would like to plot the remained peptides and get a table of the coverage.

Below I have attached a part of code for one pattern (trypsin cleaves the protein after a K) as well as the image i would like to get. 

Before I reinvent the wheel, I was wondering if there is already some other method of doing it and if so, please let me know. 

Thanks

Assa

 

protein1 <- readAAStringSet(file="Protein1.fasta")
protein1 <- toString(protein1)
hits <- matchPattern("K", protein1)
subseqs <- IRanges(PartitioningByEnd(end(hits)))
subseqs.filt <- subseqs[width(subseqs) >=6 & width(subseqs) <= 30 ]
plotRanges(subseqs.filt)

AAcovered <- sum(as.data.frame(subseqs.filt)$width)
AApeptides<-  length(as.data.frame(subseqs.filt)$width)
write.table(x = as.data.frame(subseqs.filt), file = "Protein1.txt", sep="\t", quote = F )
paste0("This gives a coverage ratio of ", format(AAcovered * 100 / hits@subject@length, trim = FALSE, digits=4 ), "%" )

 

 

ADD COMMENTlink modified 9 months ago by Michael Lawrence9.8k • written 9 months ago by Assa Yeroslaviz1.3k
1
gravatar for Michael Lawrence
9 months ago by
United States
Michael Lawrence9.8k wrote:

You should look into the Gviz package for plotting, instead of using the simple function from the IRanges vignette.

For tabulating the coverage, it's not exactly clear what you mean, but your current approach seems basically fine. I'm curious though as to why you used the width() accessor in the first part of the code, and then switched to as.data.frame()$width later, and then resorted to direct slot access. Or why you converted the AAStringSet to a character vector before matching, which would just turn it back into an XStringSet.

ADD COMMENTlink written 9 months ago by Michael Lawrence9.8k

thanks for the Gviz recommendation. I'll look into it. 

For your comments, I'll try to answer the second remark first.

I'm converting the AAStringSet into String, as i can't seem to be able to use matchPattern on a StringSet. Is there a better way of doing it? As for the first remark, because i am converting it to a string, i need to get the length of it back again. this is though only saved deep down in the hit object, which is why i am calling it like that.

I have though another problem. This is the end of my protein sequence : ...LQGRRRQPRGKKKVERN

because I am using the PartitioningByEnd overlap I miss the last four amino acids in my IRanges object.

> hits
  Views on a 1834-letter BString subject
subject: MKLSVNEAQLGFYLGSLSHLSACPGIDPRSSED...DTHCITFTTAATRREKLQGRRRQPRGKKKVERN
views:
     start  end width
 [1]     2    2     1 [K]
 [2]    39   39     1 [K]
 [3]    52   52     1 [K]
 [4]    63   63     1 [K]
 [5]    64   64     1 [K]
 ...   ...  ...   ... ...
[86]  1769 1769     1 [K]
[87]  1817 1817     1 [K]
[88]  1828 1828     1 [K]
[89]  1829 1829     1 [K]
[90]  1830 1830     1 [K]

> subseqs
IRanges object with 90 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]         1         2         2
   [2]         3        39        37
   [3]        40        52        13
   [4]        53        63        11
   [5]        64        64         1
   ...       ...       ...       ...
  [86]      1764      1769         6
  [87]      1770      1817        48
  [88]      1818      1828        11
  [89]      1829      1829         1
  [90]      1830      1830         1

Is there a way to paste the last piece of the protein (VERN) back to my Ranges object?

so that it has 91 row and the last is

  [91]      1831      1834         4

thanks

Assa

 

ADD REPLYlink written 9 months ago by Assa Yeroslaviz1.3k
1

The AAStringSet consists of individual AAString objects. You can pull one out with protein1[[1]] and call matchPattern() on that. The width() accessor will get you the length of the string; no need to dig down deep into the object. I think the last issue is now answered by another thread.

ADD REPLYlink modified 9 months ago • written 9 months ago by Michael Lawrence9.8k

thanks for that. I think this covers that question. As for the second part, I still have some other problem, when the pattern in question should be the beginning and not the end pf the snippet looked for (A: convert a sequence to Ranges object).

ADD REPLYlink written 9 months ago by Assa Yeroslaviz1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 165 users visited in the last hour