Hi all,
I'm working now on a method to try and plot a protein sequence after digested with one (or more) restriction enzyme(s).
the goal is to use a specific cleavage points and cut the protein sequence into snippets. After filtering the too long and too shots snippets, I would like to plot the remained peptides and get a table of the coverage.
Below I have attached a part of code for one pattern (trypsin cleaves the protein after a K) as well as the image i would like to get.
Before I reinvent the wheel, I was wondering if there is already some other method of doing it and if so, please let me know.
Thanks
Assa
protein1 <- readAAStringSet(file="Protein1.fasta")
protein1 <- toString(protein1)
hits <- matchPattern("K", protein1)
subseqs <- IRanges(PartitioningByEnd(end(hits)))
subseqs.filt <- subseqs[width(subseqs) >=6 & width(subseqs) <= 30 ]
plotRanges(subseqs.filt)
AAcovered <- sum(as.data.frame(subseqs.filt)$width)
AApeptides<- length(as.data.frame(subseqs.filt)$width)
write.table(x = as.data.frame(subseqs.filt), file = "Protein1.txt", sep="\t", quote = F )
paste0("This gives a coverage ratio of ", format(AAcovered * 100 / hits@subject@length, trim = FALSE, digits=4 ), "%" )


thanks for the Gviz recommendation. I'll look into it.
For your comments, I'll try to answer the second remark first.
I'm converting the
AAStringSetinto String, as i can't seem to be able to usematchPatternon a StringSet. Is there a better way of doing it? As for the first remark, because i am converting it to a string, i need to get the length of it back again. this is though only saved deep down in the hit object, which is why i am calling it like that.I have though another problem. This is the end of my protein sequence : ...
LQGRRRQPRGKKKVERNbecause I am using the
PartitioningByEndoverlap I miss the last four amino acids in myIRangesobject.> hits Views on a 1834-letter BString subject subject: MKLSVNEAQLGFYLGSLSHLSACPGIDPRSSED...DTHCITFTTAATRREKLQGRRRQPRGKKKVERN views: start end width [1] 2 2 1 [K] [2] 39 39 1 [K] [3] 52 52 1 [K] [4] 63 63 1 [K] [5] 64 64 1 [K] ... ... ... ... ... [86] 1769 1769 1 [K] [87] 1817 1817 1 [K] [88] 1828 1828 1 [K] [89] 1829 1829 1 [K] [90] 1830 1830 1 [K] > subseqs IRanges object with 90 ranges and 0 metadata columns: start end width <integer> <integer> <integer> [1] 1 2 2 [2] 3 39 37 [3] 40 52 13 [4] 53 63 11 [5] 64 64 1 ... ... ... ... [86] 1764 1769 6 [87] 1770 1817 48 [88] 1818 1828 11 [89] 1829 1829 1 [90] 1830 1830 1Is there a way to paste the last piece of the protein (
VERN) back to my Ranges object?so that it has 91 row and the last is
thanks
Assa
The AAStringSet consists of individual AAString objects. You can pull one out with
protein1[[1]]and callmatchPattern()on that. Thewidth()accessor will get you the length of the string; no need to dig down deep into the object. I think the last issue is now answered by another thread.thanks for that. I think this covers that question. As for the second part, I still have some other problem, when the pattern in question should be the beginning and not the end pf the snippet looked for (A: convert a sequence to Ranges object).