Hi all after a few days of searching and trial & error I would like to ask for your help.
I have a protein sequence, let's say this one (one line):
>protein1 MKLSVNEAQLGFYLGSIDPRSSEDQPESLKTGQMMDESDEDFKELCASFFQRVKKHGIKE VSGERKTQKAASNGTQIRSKLKRTKQTATKTKTLQGPAEKKPPSGSQAPRTKQRVTKWQ
I would like to split the protein after each occurrence of a specific AA, let's say "K" ( the cleavage point of trypsin) so that I will get a list or an IRanges object with the start and end positions) with these elements:
MK LSVNEAQLGFYLGSIDPRSSEDQPESLK TGQMMDESDEDFK ELCASFFQRVK ... PPSGSQAPRTK QRVTK WQ...
Using IRanges
and matchPattern()
, I was only able to create an object of the pattern I'm looking for, but not of the sub-sequence.
Than I would like to plot these subsequences onto the complete sequence of the protein
The end goal of my analysis is to plot the protein sequence (x-axis) against all cleavage patterns (Y-axis)
which should then looks like the attached image
the bottom line represents the protein, each of rows above stands for one specific peptide. The Idea is to calculate which protease or combination of proteases gives the highest coverage of the protein in question.
I would really like to know if there are any packages out there dealing with this kind of questions/problems as I have not found any.
Hope I have made myself clear enough and of course for any help I can get. Thanks a lot in advance
Assa

