Hi Andrew,
We don't have that much experience combining different datasets with limpa yet, but my feeling is that the same DPC should be used for the whole analysis. The DPC should be a property of the proteomics technology and of the processing software settings (DIA-NN) rather than a property of the biological samples.
I would also rather that you used dpcDE() instead of limma-trend. If you have a lot of missing values, then it is important that you preserve the standard errors and observation counts recorded by dpcQuant() and that these values are propagated into the DE analysis by dpcDE(). dpcDE() keeps track of which proteins are entirely missing in which samples, and this affects the protein-wise variance estimates used for the linear models and t-tests. The limma-trend pipeline can't quite do that.
I assume that you have two precursor level EList objects y.peptide.batch1 and y.peptide.batch2, and that they contain the same peptides and proteins in the same order. You would have to do something about peptides detected in one batch but entirely missing in the other, either removing them from the first batch or adding rows of the missing values to the second. I assume that you've done that.
The first approach is to estimate the DPC for all the data at once:
y.peptide <- cbind(y.peptide.batch1, y.peptide.batch2)
dpcest <- dpcCN(y.peptide)
y <- dpcQuant(y.peptide, protein.id="Protein.Group", dpc=dpcest)
Then optionally normalize between samples:
ynorm <- y
ynorm$E <- normalizeQuantiles(ynorm$E)
Then proceed to the DE analysis with dpcDE().
The second approach would be to run dpcQuant() batchwise but with the same DPC.
This can be done by using a preset DPC slope.
A DPC slope of around 0.7 or 0.8 is usually pretty safe for DIA-NN data.
y.batch1 <- dpcQuant(y.peptide.batch1, protein.id="Protein.Group", dpc.slope=0.8)
y.batch2 <- dpcQuant(y.peptide.batch2, protein.id="Protein.Group", dpc.slope=0.8)
y <- cbind(y.batch1, y.batch2)
Then optionally normalize between samples and proceed to DE analysis with dpcDE().
Given that you have run DIA-NN separately on the two batches, and given that you are planning to include a batch effect in the design matrix, I think that both of these approaches might work ok.
Regards,
Gordon