Gene-by-sample covariates in limma
1
0
Entering edit mode
c53aba27 • 0
@ca90afc7
Last seen 14 days ago
Canada

I am using limma to analyze mass spectrometry-based metabolomics data. I have a matrix of peaks (rows) by samples (columns). In this data, each unique metabolite is represented by multiple peaks (rows). Each metabolite has a major peak that contains only carbon-12 isotopes, but also has a series of minor peaks with 1, 2, 3, (etc.) carbon-13 isotopes.

I am interested in specifically testing whether a treatment increases the abundance of these minor carbon-13 peaks, relative to the major carbon-12 peak for each metabolite.

I can see a couple ways to do this:

  1. Create a peaks-by-samples matrix populated by peak ratios rather than peak abundances, such that each cell contains the ratio between the 13C peak and the 12C peak. For instance, a row in this matrix might represent the ratio of 13C6-glucose to 12C-glucose across samples. Then, run limma using a standard design matrix, providing the ratio matrix as input. However, whereas both the log-transformed 13C and 12C intensities are approximately normally distributed, the ratios between them are not. I believe this is inconsistent with the assumptions of the underlying linear model.
  2. Fit a linear model where the abundance of the major 12C peak is included as a covariate, e.g., intensity ~ treatment + intensity_12C. This seems like a preferable approach to me, but I don't see a straightforward way to do this with lmFit. Each 13C peak will have a different 12C peak intensity, so rather than a sample-level covariate like treatment, I have a covariate that changes for each peak. In effect, what I want to do is a bit like fitting a linear model to a given gene's expression, controlling for the expression of a second gene across the same samples.

My questions are:

  1. Is there any reason to prefer model 2 over model 1, or should I just go with the first, simpler approach?
  2. Is there a way to fit model 2 - including a second peaks (or, equivalently, genes) by samples matrix as a covariate - within limma?
  3. Digging into the limma source code, it seems to me that if I replace the lm.fit call within limma:::lm.series to fit separate linear models for each 13C peak and then reshape the output to match that of lm.fit, I could still call eBayes on the output. Is there any reason not to do this?

Thanks in advance.

Metabolomics limma • 153 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 32 minutes ago
WEHI, Melbourne, Australia

Your first method would be valid if, instead of taking ratios, you instead used the differences between the log-transformed 13C and 12C intensities. Note that differences in log-intensities is equivalent to taking ratios and then logging.

ADD COMMENT

Login before adding your answer.

Traffic: 688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6