Question

Difference between one-stage and two-stage analyses of multiple microarray experiments in limma

0

Entering edit mode

mwesthues • 0

@mwesthues-8547

Last seen 8.7 years ago

Germany

I have multiple two-color microarray experiments, analyzed with the same technology, that are connected through common genotypes. I want to compute best linear unbiased estimates (BLUEs)/adjusted values for each RNA source across all experiments. I have tried a combined analysis in limma in a single step and a two-step approach using a linear mixed model package (lme4 in R). The resulting BLUEs are highly correlated, but their estimated standard deviations of the errors are only moderately correlated. How can this be explained?

Full description and question

The dataset that I would like to analyze in limma has the following properties:

Seven separate experiments, which share a common chip and image-reading technology.
All RNA sources across experiments are biological replicates.
All RNA sources that occur mutiple times on the same array are technical replicates.
The different experiments are connected via some common genotypes, but not via common arrays.
The hybridization pairs were selected in a way that ensures that all RNA sources are connected (interwoven loop design).
A two-color design was used, where, for each experiment, every genotype was labeled once with "Cy3" and once with "Cy5", respectively.

Our goal is to compute best linear unbiased estimates (BLUE)/adjusted values of the gene expressions for each RNA source. We are NOT interested in the differences of gene expressions between the RNA sources as it is usually the case. I see two options to achieve this goal of ours:

1. A one-step approach where all arrays from every experiment are included in the limma analysis. In order to account for the existence of multiple experiments, an "Experiment"-effect is added by augmenting the design matrix by the number of experiments.

    rnasources <- uniqueTargets(targets)
    design <- modelMatrix(targets, ref = rnasources[1])
    fact_mod <- model.matrix(~0 + Experiment, dd)
    design <- cbind(design, fact_mod)

, where dd is a data frame that contains the levels of the arrays dependent on their experiment membership.

2. A two-step approach where each experiment is analyzed separately, the BLUEs are extracted from the individual experiment and enter a linear mixed model with an "Experiment" effect (for example lme4 or asreml).

I have investigated both analyses and there is a very high correlation (r = 0.97) between the LS-means from the one-stage and the two-stage analysis, respectively. However, the correlation between the estimated standard deviations of the errors (sigma both in lme4 and limma) is merely 0.7. Now I am wondering about the reason for those differences. Interestingly, the correlation between the best linear unbiased predictors (BLUP) of the random "Experiment" effect in the two-stage model and the best linear unbiased estimates (BLUE) of the fixed "Experiment" effect in the one-stage model is close to zero, which I did not expect in advance. Some assumptions:

The model matrix for the one-stage analysis is incorrect. I have read in other posts that it is fine to augment the design matrix by adding different levels of a "Batch" effect. I would consider this case to be comparable to mine, so I do not immediately see a problem here, but please correct me if I am wrong.
The precision of the sigmas is considerably higher for the one-stage approach compared to the two-stage approach, since there is simply more information available for the estimation of the BLUEs.

r limma limma design matrix • 1.7k views

ADD COMMENT • link updated 8.7 years ago by Gordon Smyth 50k • written 8.7 years ago by mwesthues • 0

0

Entering edit mode

I assume that you're referring to the BLUEs of the coefficients when you're talking about "RNA sources". The best estimate of the observed expression value for each sample would be, well, the observed expression value.

ADD REPLY • link 8.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yes, I am referring to the log-ratio between a given RNA source and my chosen reference for each gene.

ADD REPLY • link 8.7 years ago mwesthues • 0

score 0 · Answer 1 · 2015-08-18

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

I'm not familiar with lme4, but if you're plugging in the estimated coefficients directly from limma into the lme4 functions, then I presume that you're treating the estimates as known observations. This doesn't account for the uncertainty of estimation in the first fit with limma. I suppose that, if such uncertainty was included (possibly as precision weights), this would affect the standard deviations that you get from lme4.

Another reason may be that limma treats Experiment as a fixed effect, i.e., the corresponding coefficients are not considered to originate from some underlying distribution. Treating it as a random effect would likely change the standard deviations of all coefficient estimates, as the certainty of those estimates would vary according to the likelihood of obtaining the estimated Experiment coefficients from the underlying distribution.

ADD COMMENT • link 8.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your answer! That sounds reasonable.

My assumption is that there is no option to treat such experimental effects in limma as random or am I wrong about that?
Would you change anything about my one-stage approach or does this also seem correct to you?

ADD REPLY • link 8.7 years ago mwesthues • 0

1

Entering edit mode

You could try blocking on Experiment with duplicateCorrelation rather than putting it in the design matrix. Recall that the latter approach does not consider the variability between Experiment coefficients. With the duplicateCorrelation method, this variability is now incorporated into the variance estimates for each gene. This might be closer to a random effects model, as the variability between experiments will be reflected in the estimation of the other coefficients. However, I'm not enough of an expert in this to be sure about it, so take it with a grain of salt.

ADD REPLY • link 8.7 years ago Aaron Lun ★ 28k