Use of arrayWeights with two treatment groups
1
0
Entering edit mode
Emma • 0
@57c383ff
Last seen 10 months ago
United Kingdom

I am analysing some microarray data. For a set of participants, I have their RNA expression before an intervention ("Baseline") and after the intervention ("Followup"). I want to know which genes are differentially expressed between the two timepoints. As suggested in the Limma user guide when working with human in vivo data, I have used the arrayWeights() function to weight the arrays according to their quality. From what I understand, this function performs a regression on all samples and then weights the samples according to their distance from this regression. However, as there are two treatment groups ("Baseline" and "Followup"), I would like to know if it is suitable to use a single regression in this way, as I am expecting there to be differences between the two groups. Does it make sense instead to perform two array weightings - one for the "Baseline" samples and one for the "Followup" samples? If so, how would I go about altering my code to incorporate this?

My current code:

 y <- neqc(x)
design <- model.matrix(~Participant+Time)
arrayw <- arrayWeights(y)
fitw <- lmFit(y,design,weights=arrayw)
fitw <- eBayes(fitw)
topTable(fitw,coef="TimeFollowup")


Thanks in advance for any help!

Microarray arrays limma MicroarrayData • 877 views
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Your code is fine. Differential expression does not affect the quality weights.

I'm not sure where you have read about a regression on all the samples, but that is not how the weights are computed.

0
Entering edit mode

Thanks for the answer! Just wondering, how does the function determine array quality?

1
Entering edit mode

There is a description and references in the help page.

Details:

The relative reliability of each array is estimated by measuring
how well the expression values for that array follow the linear
model. Arrays that tend to have larger residuals are assigned
lower weights.

The basic method is described by Ritchie et al (2006) and the
extension to custom variance models by Liu et al (2015). A
weighted linear model is fitted to the expression values for each
gene. The variance model is fitted to the squared residuals from
the linear model fit and is updated either by full REML scoring
iterations ('method="reml"') or using an efficient gene-by-gene
update algorithm ('method="genebygene"'). The final estimates of
these array variances are converted to weights. The gene-by-gene
algorithm is described by Ritchie et al (2006) while the REML
algorithm is an adaption of the algorithm of Smyth (2002).

For stability, the array weights are squeezed slightly towards
equality. This is done by adding a prior likelihood corresponding
to unit array weights equivalent to 'prior.n' genes. The
gene-by-gene algorithm is started from the prior genes while the
REML algorithm adds the prior to the log-likelihood derivatives.

By default, 'arrayWeights' chooses between the REML and
gene-by-gene algorithms automatically ('method="auto"'). REML is
chosen if there are no prior weights or missing values and
otherwise the gene-by-gene algorithm is used.

The input 'object' is interpreted as for 'lmFit' and 'getEAWP'. In
particular, the arguments 'design' and 'weights' will be extracted
from the data 'object' if available and do not normally need to be
set explicitly in the call; if any of these are set in the call
then they will over-ride the slots or components in the data
'object'.

Value:

A numeric vector of array weights, which multiply to 1.

Author(s):

Matthew Ritchie and Gordon Smyth

References:

Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S.,
Blewitt, M. E., Asselin-Labat, M.-L., Smyth, G. K., Ritchie, M. E.
(2015). Why weight? Combining voom with estimates of sample
quality improves power in RNA-seq analyses. _Nucleic Acids
Research_ 43, e97. <URL:
http://nar.oxfordjournals.org/content/43/15/e97>

Ritchie, M. E., Diyagama, D., Neilson, van Laar, R., J., Dobrovic,
A., Holloway, A., and Smyth, G. K. (2006). Empirical array quality
weights in the analysis of microarray data. _BMC Bioinformatics_
*7*, 261. <http://www.biomedcentral.com/1471-2105/7/261>

Smyth, G. K. (2002). An efficient algorithm for REML in
heteroscedastic regression. _Journal of Computational and
Graphical Statistics_ *11*, 836-847. <URL:
http://www.statsci.org/smyth/pubs/remlalgo.pdf>

0
Entering edit mode

Hi, Thanks a lot for your help!

Apologies, I'm still a bit confused -

As you write, "the relative reliability of each array is estimated by measuring how well the expression values for that array follow the linear model", but if I have two different treatment groups then I would expect the expression values to be different between the two groups - by weighting them according to how well they fit a linear model based on both treatment groups combined aren't I essentially adjusting out some of the effect of my treatment?

0
Entering edit mode

No, quite the opposite. The method clarifies the treatment effect rather than removing it.