I have two questions for you. First, does limpa assume that all experimental samples have been normalized to technical/process blanks before they are entered into the limpa workflow? If not, is this a step I should be performing before using limpa?
Second, when using Spectronaut v20 for PTM analysis, the user is provided with the option to use input normalization and PTM site stoichiometry to improve the accuracy of their PTM analysis results. "stoichiometry...reflects the proportion of a given PTM site that carries specific PTM and its unmodified levels." Input normalization is "...a process that adjusts PTM site quantities using data from pre-enrichment samples". Both definitions are quoted from the source linked below. Aside from selecting a different column in the qty.column argument of the readSpectronaut() function, are there other changes or considerations that the user should make to the limpa workflow when selecting these options in their PTM analysis?
He left some comments regarding normalization. Here is the relevant quote:
The EList object produced by dpcQuant(), containing protein log-expression values, can be normalized by normalizeBetweenArrays(), but we find that normalization is not always necessary. Upstream peptide quantification tools like DIA-NN seem to already do some normalization, which sometimes seems to us to be sufficient.
source (To Dr. Smyth: I wanted to thank you and say that I have great respect your work, user support and the way you educate users like myself. )
The short answer is that limpa reads standard feature-level intensities from Spectronaut or other quantifications tools. limpa is designed to work on intensities, not on ratios of intensities or on proportions. There is no need to do any special or artificial normalizations. An example limpa analysis with Spectronaut output is provided on the limpa documentation page https://github.com/SmythLab/limpa/ .
I don't know what you mean by "technical/process blanks". I asked my proteomics expert colleagues and they haven't heard of such a concept either. There is no mention of blanks in the Spectronaut manuals.
If you are actually refering to control samples, then limpa will make any required comparisons to the control samples as part of the design matrix, same as would be done for any expression analysis. limpa expects to get control samples and treatment samples as separate columns.
You do not need to do any ad hoc normalizations yourself, and it would be wrong to do so.
Input normalization and PTM stochiometry are new features in Spectronaut and we are not familiar with them yet. However, looking at the BIOGNOSYS web page, I am worried that these normalizations will interfere with limpa's assumptions. limpa is designed to analyse intensities, not ratios of intensities, and certainly not proportions on an 0 < x < 1 scale. As a statistician, I would much prefer to get the PTM and protein abundances separately, so that I can relate one to the other as part of the statistical analysis. I don't want Spectronaut to take ratios before passing the data to me, because then I cannot infer whether the original intensities were large or small in the first place. Similarly with flyability ratios. I would want to get intensities for modified sites and their unmodified counterparts separately, not pre-normalized into proportions in some proprietry way by Spectronaut. What if one of the intensities was NA? What would sort of ratio or proportion would Spectronaut return then? I suspect we would lose information about what was detected and what was not.
Thanks for the quick reply. The technical or process blank samples I'm referring to are part of a set of quality control samples that are run with every mass spec experiment that I've seen from my institution. The blank samples are not experimental samples, and contain only the solvent and reagents used during the experimental sample prep. Ideally, the blank samples should contain no detectable protein. If the blank samples do contain detectable proteins, it's assumed these detected proteins are contaminants acquired during the sample preparation. I've seen different ways of handling these contaminants in proteomic workflows. Often, the proteins identified in the blank samples are completely removed from the analysis. Or, any protein abundance value in an experimental sample that is not at least 3-5 times higher than the maximum protein abundance value found in blank sample(s) is converted to NA. I don't like using arbitrary cut-offs or removing data unnecessarily. I'd love to hear how other groups handle the potential contaminants.
It is fine for your mass spec facility to include blank samples as part of their internal QC, but such samples should not in my opinion be part of the bioinformatics proteomics workflow. I do not know of any proteomics facility that does the sort of interventions that you have described. IMO you should be including biological controls, not technical controls, in your analysis. If there was any contamination that was consistently introduced by the sample preparation, then it would cancel out of the DE analysis.
Including blank samples in the Spectronaut and allowing Spectronaut to do normalization between samples will give very bad results.
As far as limpa is concerned, limpa expects to the get the actual precursor intensities from Spectronaut. You should not be making any ad hoc adjustments to the Spectronaut file before inputing it to limpa, and I am not quite sure how you could even if you wanted to. We recently used limpa to compare IP samples to IgG, where the IgG samples were almost like blanks. Even then, we did a standard limpa analysis without any need for anything ad hoc.
If you have reason to think that some particular protein is due to contamination, you could easily ignore or remove results for that protein from the downstream limpa results, but that is part of interpretation, not an ad hoc change that you make to the input data.
Hi Emily,
Just sharing a resource for handling contaminants - we use the methods described in Frankenfield et al. (2022) - download FASTA contaminant libraries from here and incorporate in your Spectronaut/DIA-NN/etc search, then remove any Protein Groups containing 'Cont_'
Flagging protein groups as likely contaminants in this way will be fine for a limpa analysis. The Cont_ proteins can optionally stay in for limpa DPC estimation, but should be filtered out before any normalization or DE steps.
Hi Emily,
He left some comments regarding normalization. Here is the relevant quote:
source (To Dr. Smyth: I wanted to thank you and say that I have great respect your work, user support and the way you educate users like myself. )
Take care, Jay