Question

limma - technical replicates: duplicateCorrelation() or avereps()?

3

Entering edit mode

Guido Hooiveld ★ 4.1k

@guido-hooiveld-2020

Last seen 1 day ago

Wageningen University, Wageningen, the …

I am about to analyze an Affymetrix experiment that consists of repeated measurements (cells from same individual measured multiple times) as well as technical replicates (for most of the samples [i.e RNA samples are labelled and hybridized 2 times]).

For the repeated measurements I know I should go for a paired analyses (i.e. block on SubjectID and include this in the model), but I am actually unsure how to best utilize the info from the technical replicates....

According to some posts on this site the technical replicates could be analyzed using the duplicateCorrelation() function according to column TechRep (e.g. see A: Duplicate Correlation with technical replicates), whereas in other posts the use of avereps() is recommended (e.g. see A: Including technical and biological replicates in Limma block analysis?).

Any suggestions would be appreciated.

Thanks, Guido

Part of the targets file.

> targets
                 Filename Condition SubjectID TechRep
1   G141_B09_MUSCLE_6.CEL      Ctrl         1       1
2   G141_C09_MUSCLE_7.CEL      Ctrl         1       1
3   G141_D09_MUSCLE_8.CEL      Ctrl         2       2
4   G141_E07_MUSCLE_1.CEL      Ctrl         2       2
5   G141_F07_MUSCLE_2.CEL      Ctrl         3       3
6  G141_F09_MUSCLE_10.CEL      Ctrl         3       3
7   G141_G07_MUSCLE_3.CEL      Ctrl         4       4
8  G141_G09_MUSCLE_11.CEL      Ctrl         4       4
9   G141_H07_MUSCLE_4.CEL      Ctrl         5       5
10 G141_H09_MUSCLE_12.CEL      Ctrl         6       6
11 G141_B09_MUSCLE_16.CEL Treatment         1       7
12 G142_C09_MUSCLE_17.CEL Treatment         1       7
13 G142_D09_MUSCLE_18.CEL Treatment         2       8
14 G142_E07_MUSCLE_11.CEL Treatment         2       8
15 G142_F07_MUSCLE_12.CEL Treatment         3       9
16 G142_F09_MUSCLE_20.CEL Treatment         3       9
17 G142_G07_MUSCLE_13.CEL Treatment         4      10
18 G142_G09_MUSCLE_21.CEL Treatment         4      10
19 G142_H07_MUSCLE_14.CEL Treatment         5      11
20 G142_H09_MUSCLE_22.CEL Treatment         6      12
>

limma duplicatecorrelation avereps • 4.1k views

ADD COMMENT • link updated 8.5 years ago by Gordon Smyth 52k • written 8.5 years ago by Guido Hooiveld ★ 4.1k

score 1 · Answer 1 · 2016-09-30

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

I would use avereps(). Using duplicateCorrelation() will lead to a somewhat liberal analysis (with p-values too small). duplicateCorrelation() is intended for more complicated situations in which the technical replicates can't be averaged without losing some information about the treatments.

If you are worried that you have n=2 technical replicates for some samples and n=1 for others, you could run arrayWeights() on the averaged data. That will compensate for different precisions if they exist.

ADD COMMENT • link 8.5 years ago Gordon Smyth 52k

0

Entering edit mode

OK, thanks! However, since I don't fully understand, would you mind briefly explain / give a small example of what you mean with a "more complicated situation in which the technical replicates can't be averaged without losing some information about the treatments"?

ADD REPLY • link 8.5 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Dear Gordon,

Sorry to revive this thread, but I just read a letter to the editor of Biostatistics, in which the authors implicitly claim the contrary of what you suggested: it would be 'better' to use duplicateCorrelation() to handle technical replicates rather than to use avereps() (or avearrays()). Interestingly, by doing so, they refer to a number of your papers and mailing list posts.

Letter: http://dx.doi.org/10.1093/biostatistics/kxw031

The conclusion of the authors is based on the number of significant genes that were identified in a data set, but I realize that if you don't know "the truth" (as is usually the case in omics experiments) one could question the relevance of this read-out, especially since you stated using duplicateCorrelation() indeed leads to a liberal analysis. On the other hand, using two other methods the authors found similar numbers of significant genes, although the overlap between these lists of genes is not reported.

Since the topic of how to handle technical replicates regularly pops up on this list, I would appreciate (again) your opinion [and that of others] on this.

Thanks very much in advance,
Guido

ADD REPLY • link 8.4 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

duplicateCorrelation is especially intended for cases where you want to include a batch covariate in the model but can't because it is partially confounded with the covariate of interest, resulting in a rank-deficient model. In this design, you would not want to include TechRep in the model at all, so there's no need for duplicateCorrelation.

ADD REPLY • link 8.4 years ago Ryan C. Thompson ★ 7.9k