Hi, I am analyzing an untargeted LC-MS lipidomics data set (without standards) of ~150 lipids from 4 experimental groups (2 genotypes x 2 treatments factorial design) and 3 samples per group. We are interested in various contrasts but especially comparing genotype effect within each group. I am new to MS data, but based on my experience with similar RNA-Seq study designs, I thought that limma-trend might work well. I see some support for that idea in various posts on here and elsewhere, but the normalization procedure is not so clear to me. Would log2 followed by quantile normalization be the "best" way to start as suggested here? Or, sum normalization followed by log and then z-scaling as here? Or, perhaps even TMM normalization as one possibility implied here. I have also had median followed by log normalization suggested to me. I think all agree on taking log, but I wonder if we need to start with sum or median normalization, and then what if anything to do after log transformation.
I include minimal example code below showing the normalization in context for how I understand it currently. I have analyzed my data few ways and the results turn out not too matter much for this data set, but I am curious which you would argue is the most defensible in principal. Judging from the numerous papers and programs presenting various normalization methods of MS data, it seems the field has not settled on the "best" ways, but I wonder what the leading contenders might be for use with limma and if I am implementing them correctly.
Perhaps it is worth adding that we do not have much missing data, so the few lipids with missing data were simply excluded. In addition to the ~150 named lipids, we have >1000 unnamed compounds ("features") that I have been excluding but could combine with the names lipids into a single analysis if appropriate (though not sure what we would learn from them if significant).
Thanks for your help!
## Organize data and design
cnt <- DGEList(data)
cnt$samples$group <- group <- sample.info$Group
design <- model.matrix(~0+group)
## Normalization options
# 1) log2 and quantile normalization
data.norm <- normalizeQuantiles(log2(data))
# 2) Sum, log, and z: not entirely sure about this one
data.temp <- sweep(data, 2, colSums(data), FUN="/")
data.norm <- t(scale(t(log2(data.temp))))
# 3) TMM
cnt <- calcNormFactors(cnt, method = "TMM")
data.norm <- cpm(cnt, log=TRUE)
## Define contrasts and fit model
contr.matrix <- makeContrasts(AvBinX=groupX.A-groupX.B, AvBinY=groupY.A-groupY.B, levels = colnames(design))
tfit <- lmFit(data.norm, design)
tfit <- contrasts.fit(tfit, contr.matrix)
efit <- eBayes(tfit, trend=TRUE, robust=TRUE)
#etc.