Question

Combining two proteomics datasets with limpa

0

Entering edit mode

Andrew Pattison • 0

@52342768

Last seen 1 day ago

Australia

Hi limpa authors,

Thanks for your efforts with this package, I have been getting into proteomics with a transcriptomics background and it has been very useful so far.

I'm looking compare groups from two separate e.coli proteomics datasets run in different batches with different sample prep. I have used DIA-NN with the same reference to get matching peptide inputs for limpa. There is one group that is the same between batches so I was going to put a batch term into the design martix to do a combined analysis.

My question is how do the detection probability curves work with this? I was going to calculate this per-batch and then just use the protein values from dpcQuant as input into a limma-trend pipeline, but then I think I would be losing some of the info from the DPC uncertainty? So is there a more correct way to do this?

Cheers, Andrew

limpa ProteomicsWorkflow limma • 94 views

ADD COMMENT • link updated 2 days ago by Gordon Smyth 53k • written 2 days ago by Andrew Pattison • 0

score 0 · Answer 1 · 2026-01-07

Hi Andrew,

We don't have that much experience combining different datasets with limpa yet, but my feeling is that the same DPC should be used for the whole analysis. The DPC should be a property of the proteomics technology and of the processing software settings (DIA-NN) rather than a property of the biological samples.

I would also rather that you used dpcDE() instead of limma-trend. If you have a lot of missing values, then it is important that you preserve the standard errors and observation counts recorded by dpcQuant() and that these values are propagated into the DE analysis by dpcDE(). dpcDE() keeps track of which proteins are entirely missing in which samples, and this affects the protein-wise variance estimates used for the linear models and t-tests. The limma-trend pipeline can't quite do that.

I assume that you have two precursor level EList objects y.peptide.batch1 and y.peptide.batch2, and that they contain the same peptides and proteins in the same order. You would have to do something about peptides detected in one batch but entirely missing in the other, either removing them from the first batch or adding rows of the missing values to the second. I assume that you've done that.

The first approach is to estimate the DPC for all the data at once:

y.peptide <- cbind(y.peptide.batch1, y.peptide.batch2)
dpcest <- dpcCN(y.peptide)
y <- dpcQuant(y.peptide, protein.id="Protein.Group", dpc=dpcest)

Then optionally normalize between samples:

ynorm <- y
ynorm$E <- normalizeQuantiles(ynorm$E)

Then proceed to the DE analysis with dpcDE().

The second approach would be to run dpcQuant() batchwise but with the same DPC. This can be done by using a preset DPC slope. A DPC slope of around 0.7 or 0.8 is usually pretty safe for DIA-NN data.

y.batch1 <- dpcQuant(y.peptide.batch1, protein.id="Protein.Group", dpc.slope=0.8)
y.batch2 <- dpcQuant(y.peptide.batch2, protein.id="Protein.Group", dpc.slope=0.8)
y <- cbind(y.batch1, y.batch2)

Then optionally normalize between samples and proceed to DE analysis with dpcDE().

Given that you have run DIA-NN separately on the two batches, and given that you are planning to include a batch effect in the design matrix, I think that both of these approaches might work ok.

Regards, Gordon