I have DIA experiment with 16 subjects, 96 samples in total (two independent groups, and each subject measured at 6 time points). The data were first processed with quantms nextflow pipeline (--min_peptide_length 7 --max_peptide_length 30 --allowed_missing_cleavages 1 --targeted_only false --min_pr_mz 400 --max_pr_mz 1000) that produced 'diann_report.tsv' output file, first row of which (when transposed) looks like
File.Name "6007_TIMS3_RSLC-110m_TFA_DIA_031_1_BB1_1_36147.d"
Run "6007_TIMS3_RSLC-110m_TFA_DIA_031_1_BB1_1_36147"
Protein.Group "P36578"
Protein.Ids "P36578"
Protein.Names "RL4_HUMAN"
Genes "RPL4"
PG.Quantity "117.003"
PG.Normalised "103.357"
PG.MaxLFQ "463.488"
Genes.Quantity "117.003"
Genes.Normalised "103.357"
Genes.MaxLFQ "463.488"
Genes.MaxLFQ.Unique "463.488"
Modified.Sequence "AAAAAAALQAK"
Stripped.Sequence "AAAAAAALQAK"
Precursor.Id "AAAAAAALQAK2"
Precursor.Charge "2"
Q.Value "0.000165728"
PEP "0.012917"
Global.Q.Value "0.211434"
Protein.Q.Value "0.0979681"
PG.Q.Value "0.0983379"
Global.PG.Q.Value "0.00277264"
GG.Q.Value "0.0983379"
Translated.Q.Value "0"
Proteotypic "1"
Precursor.Quantity "117.003"
Precursor.Normalised "103.357"
Precursor.Translated "132.89"
Translated.Quality NA
Ms1.Translated "171.507"
Quantity.Quality "0.946464"
RT "35.1009"
RT.Start "35.0157"
RT.Stop "35.1577"
iRT "-13.7274"
Predicted.RT "35.0973"
Predicted.iRT "-13.7144"
First.Protein.Description "Large ribosomal subunit protein uL4"
Lib.Q.Value "0.00548697"
Lib.PG.Q.Value "0.00129199"
Ms1.Profile.Corr "0.649789"
Ms1.Area "151.004"
Evidence "1.59567"
Spectrum.Similarity "0.0838866"
Averagine "0.035785"
Mass.Evidence "1.36832"
CScore "0.987797"
Decoy.Evidence "0"
Decoy.CScore "-1e+07"
Fragment.Quant.Raw "117.003;0;0;0;63.0011;47.0011;74.0021;0;0;62.0011;56.0011;0;"
Fragment.Quant.Corrected "117.003;0;0;0;63.0011;47.0011;74.0021;0;0;62.0011;56.0011;0;"
Fragment.Correlations "0.946464;0;0;0;0.691234;0.244415;0.649222;0;0;0.691234;0.244415;0;"
MS2.Scan "37086"
IM "0.848068"
iIM "0.852708"
Predicted.IM "0.846738"
Predicted.iIM "0.853832"
Now
library(limpa)
y.peptide <- readDIANN("diann_report.tsv", q.cutoffs = 0.01, q.columns = c("Q.Value","Lib.Q.Value","Lib.PG.Q.Value"))
print(dim(y.peptide))
[1] 16974 96
After import, limma::plotDensities() plot looks like
and regarding missing values
y.peptide$E %>% is.na() %>% apply(2, mean) %>% plot()
# some basic filters
y.peptide <- filterNonProteotypicPeptides(y.peptide)
y.peptide <- filterCompoundProteins(y.peptide)
print(dim(y.peptide))
[1] 15471 96
dpc gives 0.63, dpcON(robust=TRUE) gives 0.64, dpcCN gives 1.08. I chosen dpcCN:
y.protein <- dpcQuant(y.peptide, "Protein.Group", dpc=dpcCN(y.peptide))
dim(y.protein)
[1] 2002 96
Now densities look like
Should I normalize my data or remove outlying samples before using limpa? And if yes, at which stage - immediately after import, before dpc(), before dpcQuant(), or before dpcDE()?

Hi Gordon Smyth, thank you a lot for very fast and valuable response. I edited the question to provide more details.