Inspecting proteinGroups file (results from MaxQuant): There is column named "Majority protein ids". According to Tyanova et al., (NatureprotocolsVOL12016) this column contains at least half of the peptides assigned to a protein group. Thus, this column often contains multiple protein IDs per entry.
In case of multiple protein IDs per one table cell, which protein ID should be selected for downstream analysis? Mainly for assignment of GO ids, and calculation of GO enrichment. 1. Is it better to take the first protein ID in each table cell, which should be the best one, as they are sorted according to the total number of identified peptides? OR 2. Is it better to take all protein IDs in each table cell? This way we are accounting for simultaneous translation of paralogs (while over representing those that were not translated).