I am using limma to analyze mass spectrometry-based metabolomics data. I have a matrix of peaks (rows) by samples (columns). In this data, each unique metabolite is represented by multiple peaks (rows). Each metabolite has a major peak that contains only carbon-12 isotopes, but also has a series of minor peaks with 1, 2, 3, (etc.) carbon-13 isotopes.
I am interested in specifically testing whether a treatment increases the abundance of these minor carbon-13 peaks, relative to the major carbon-12 peak for each metabolite.
I can see a couple ways to do this:
- Create a peaks-by-samples matrix populated by peak ratios rather than peak abundances, such that each cell contains the ratio between the 13C peak and the 12C peak. For instance, a row in this matrix might represent the ratio of 13C6-glucose to 12C-glucose across samples. Then, run limma using a standard design matrix, providing the ratio matrix as input. However, whereas both the log-transformed 13C and 12C intensities are approximately normally distributed, the ratios between them are not. I believe this is inconsistent with the assumptions of the underlying linear model.
- Fit a linear model where the abundance of the major 12C peak is included as a covariate, e.g.,
intensity ~ treatment + intensity_12C
. This seems like a preferable approach to me, but I don't see a straightforward way to do this withlmFit
. Each 13C peak will have a different 12C peak intensity, so rather than a sample-level covariate like treatment, I have a covariate that changes for each peak. In effect, what I want to do is a bit like fitting a linear model to a given gene's expression, controlling for the expression of a second gene across the same samples.
My questions are:
- Is there any reason to prefer model 2 over model 1, or should I just go with the first, simpler approach?
- Is there a way to fit model 2 - including a second peaks (or, equivalently, genes) by samples matrix as a covariate - within limma?
- Digging into the limma source code, it seems to me that if I replace the
lm.fit
call withinlimma:::lm.series
to fit separate linear models for each 13C peak and then reshape the output to match that oflm.fit
, I could still call eBayes on the output. Is there any reason not to do this?
Thanks in advance.