I was searching Bioconductor for a peptide to protein group assembly function (i.e protein inference). The problem and my favorite solution are nicely described in Figure 1 of Zhang et al. IDPicker paper (https://www.ncbi.nlm.nih.gov/pubmed/17676885):
What are your suggestions?
Kind regards, Daniel
My investigation so far
I looked at
MSnID::infer_parsimonious_accessions(). Here, no grouping of equivalent proteins/peptides occurs (Step B in the figure). Internally the
which.max() call will pick only the first of equal scoring protein accesions, where the order is depending on the input. Ideally, I would like to keep this equally good information.
In the example from the figure (step D, middle cluster) the following difference would occur:
- MSnID::inferparsimoniousaccessions gets 2 protein groups "pro4,9" with "pep2;pep10" and "pro6" with "pep6"
- IDpicker gets 1 protein group: "pro4,9;pro6" with 3 peptides: "pep2;pep6;pep10"
- To decide on the order within a protein group more information from the measurement is needed and should not be part of this question.
- For later, i.e. after step D in the figure, intensity aggregation by protein group the MSnbase::combineFeatures() functions seems to be a good way.