Hi all,
I'm working now on a method to try and plot a protein sequence after digested with one (or more) restriction enzyme(s).
the goal is to use a specific cleavage points and cut the protein sequence into snippets. After filtering the too long and too shots snippets, I would like to plot the remained peptides and get a table of the coverage.
Below I have attached a part of code for one pattern (trypsin cleaves the protein after a K)
as well as the image i would like to get.
Before I reinvent the wheel, I was wondering if there is already some other method of doing it and if so, please let me know.
Thanks
Assa
protein1 <- readAAStringSet(file="Protein1.fasta") protein1 <- toString(protein1) hits <- matchPattern("K", protein1) subseqs <- IRanges(PartitioningByEnd(end(hits))) subseqs.filt <- subseqs[width(subseqs) >=6 & width(subseqs) <= 30 ] plotRanges(subseqs.filt) AAcovered <- sum(as.data.frame(subseqs.filt)$width) AApeptides<- length(as.data.frame(subseqs.filt)$width) write.table(x = as.data.frame(subseqs.filt), file = "Protein1.txt", sep="\t", quote = F ) paste0("This gives a coverage ratio of ", format(AAcovered * 100 / hits@subject@length, trim = FALSE, digits=4 ), "%" )
thanks for the Gviz recommendation. I'll look into it.
For your comments, I'll try to answer the second remark first.
I'm converting the
AAStringSet
into String, as i can't seem to be able to usematchPattern
on a StringSet. Is there a better way of doing it? As for the first remark, because i am converting it to a string, i need to get the length of it back again. this is though only saved deep down in the hit object, which is why i am calling it like that.I have though another problem. This is the end of my protein sequence : ...
LQGRRRQPRGKKKVERN
because I am using the
PartitioningByEnd
overlap I miss the last four amino acids in myIRanges
object.Is there a way to paste the last piece of the protein (
VERN
) back to my Ranges object?so that it has 91 row and the last is
thanks
Assa
The AAStringSet consists of individual AAString objects. You can pull one out with
protein1[[1]]
and callmatchPattern()
on that. Thewidth()
accessor will get you the length of the string; no need to dig down deep into the object. I think the last issue is now answered by another thread.thanks for that. I think this covers that question. As for the second part, I still have some other problem, when the pattern in question should be the beginning and not the end pf the snippet looked for (A: convert a sequence to Ranges object).