Question: limma - technical replicates: duplicateCorrelation() or avereps()?
1
gravatar for Guido Hooiveld
2.7 years ago by
Guido Hooiveld2.5k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.5k wrote:

I am about to analyze an Affymetrix experiment that consists of repeated measurements (cells from same individual measured multiple times) as well as technical replicates (for most of the samples [i.e RNA samples are labelled and hybridized 2 times]).

For the repeated measurements I know I should go for a paired analyses (i.e. block on SubjectID and include this in the model), but I am actually unsure how to best utilize the info from the technical replicates....

According to some posts on this site the technical replicates could be analyzed using the duplicateCorrelation() function according to column TechRep (e.g. see A: Duplicate Correlation with technical replicates), whereas in other posts the use of avereps() is recommended (e.g. see A: Including technical and biological replicates in Limma block analysis?).

Any suggestions would be appreciated.

Thanks, Guido

 

Part of the targets file.

> targets
                 Filename Condition SubjectID TechRep
1   G141_B09_MUSCLE_6.CEL      Ctrl         1       1
2   G141_C09_MUSCLE_7.CEL      Ctrl         1       1
3   G141_D09_MUSCLE_8.CEL      Ctrl         2       2
4   G141_E07_MUSCLE_1.CEL      Ctrl         2       2
5   G141_F07_MUSCLE_2.CEL      Ctrl         3       3
6  G141_F09_MUSCLE_10.CEL      Ctrl         3       3
7   G141_G07_MUSCLE_3.CEL      Ctrl         4       4
8  G141_G09_MUSCLE_11.CEL      Ctrl         4       4
9   G141_H07_MUSCLE_4.CEL      Ctrl         5       5
10 G141_H09_MUSCLE_12.CEL      Ctrl         6       6
11 G141_B09_MUSCLE_16.CEL Treatment         1       7
12 G142_C09_MUSCLE_17.CEL Treatment         1       7
13 G142_D09_MUSCLE_18.CEL Treatment         2       8
14 G142_E07_MUSCLE_11.CEL Treatment         2       8
15 G142_F07_MUSCLE_12.CEL Treatment         3       9
16 G142_F09_MUSCLE_20.CEL Treatment         3       9
17 G142_G07_MUSCLE_13.CEL Treatment         4      10
18 G142_G09_MUSCLE_21.CEL Treatment         4      10
19 G142_H07_MUSCLE_14.CEL Treatment         5      11
20 G142_H09_MUSCLE_22.CEL Treatment         6      12
>
ADD COMMENTlink modified 2.7 years ago by Gordon Smyth37k • written 2.7 years ago by Guido Hooiveld2.5k
Answer: limma - technical replicates: duplicateCorrelation() or avereps()?
1
gravatar for Gordon Smyth
2.7 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

I would use avereps(). Using duplicateCorrelation() will lead to a somewhat liberal analysis (with p-values too small). duplicateCorrelation() is intended for more complicated situations in which the technical replicates can't be averaged without losing some information about the treatments.

If you are worried that you have n=2 technical replicates for some samples and n=1 for others, you could run arrayWeights() on the averaged data. That will compensate for different precisions if they exist.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Gordon Smyth37k

OK, thanks! However, since I don't fully understand, would you mind briefly explain / give a small example of what you mean with a "more complicated situation in which the technical replicates can't be averaged without losing some information about the treatments"?

ADD REPLYlink written 2.7 years ago by Guido Hooiveld2.5k

Dear Gordon,

Sorry to revive this thread, but I just read a letter to the editor of Biostatistics, in which the authors implicitly claim the contrary of what you suggested: it would be 'better' to use duplicateCorrelation() to handle technical replicates rather than to use avereps() (or avearrays()). Interestingly, by doing so, they refer to a number of your papers and mailing list posts.

Letter: http://dx.doi.org/10.1093/biostatistics/kxw031

The conclusion of the authors is based on the number of significant genes that were identified in a data set, but I realize that if you don't know "the truth" (as is usually the case in omics experiments) one could question the relevance of this read-out, especially since you stated using duplicateCorrelation() indeed leads to a liberal analysis. On the other hand, using two other methods the authors found similar numbers of significant genes, although the overlap between these lists of genes is not reported.

Since the topic of how to handle technical replicates regularly pops up on this list, I would appreciate (again) your opinion [and that of others] on this.

Thanks very much in advance,
Guido

ADD REPLYlink written 2.7 years ago by Guido Hooiveld2.5k

duplicateCorrelation is especially intended for cases where you want to include a batch covariate in the model but can't because it is partially confounded with the covariate of interest, resulting in a rank-deficient model. In this design, you would not want to include TechRep in the model at all, so there's no need for duplicateCorrelation.

ADD REPLYlink written 2.7 years ago by Ryan C. Thompson7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 249 users visited in the last hour