Question

Gene-by-sample covariates in limma

0

Entering edit mode

c53aba27 • 0

@ca90afc7

Last seen 14 days ago

Canada

I am using limma to analyze mass spectrometry-based metabolomics data. I have a matrix of peaks (rows) by samples (columns). In this data, each unique metabolite is represented by multiple peaks (rows). Each metabolite has a major peak that contains only carbon-12 isotopes, but also has a series of minor peaks with 1, 2, 3, (etc.) carbon-13 isotopes.

I am interested in specifically testing whether a treatment increases the abundance of these minor carbon-13 peaks, relative to the major carbon-12 peak for each metabolite.

I can see a couple ways to do this:

Create a peaks-by-samples matrix populated by peak ratios rather than peak abundances, such that each cell contains the ratio between the 13C peak and the 12C peak. For instance, a row in this matrix might represent the ratio of 13C6-glucose to 12C-glucose across samples. Then, run limma using a standard design matrix, providing the ratio matrix as input. However, whereas both the log-transformed 13C and 12C intensities are approximately normally distributed, the ratios between them are not. I believe this is inconsistent with the assumptions of the underlying linear model.
Fit a linear model where the abundance of the major 12C peak is included as a covariate, e.g., intensity ~ treatment + intensity_12C. This seems like a preferable approach to me, but I don't see a straightforward way to do this with lmFit. Each 13C peak will have a different 12C peak intensity, so rather than a sample-level covariate like treatment, I have a covariate that changes for each peak. In effect, what I want to do is a bit like fitting a linear model to a given gene's expression, controlling for the expression of a second gene across the same samples.

My questions are:

Is there any reason to prefer model 2 over model 1, or should I just go with the first, simpler approach?
Is there a way to fit model 2 - including a second peaks (or, equivalently, genes) by samples matrix as a covariate - within limma?
Digging into the limma source code, it seems to me that if I replace the lm.fit call within limma:::lm.series to fit separate linear models for each 13C peak and then reshape the output to match that of lm.fit, I could still call eBayes on the output. Is there any reason not to do this?

Thanks in advance.

Metabolomics limma • 153 views

ADD COMMENT • link updated 14 days ago by Gordon Smyth 50k • written 14 days ago by c53aba27 • 0

score 0 · Answer 1 · 2024-05-02

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 32 minutes ago

WEHI, Melbourne, Australia

Your first method would be valid if, instead of taking ratios, you instead used the differences between the log-transformed 13C and 12C intensities. Note that differences in log-intensities is equivalent to taking ratios and then logging.

ADD COMMENT • link 14 days ago Gordon Smyth 50k