I am about to analyze an Affymetrix experiment that consists of repeated measurements (cells from same individual measured multiple times) as well as technical replicates (for most of the samples [i.e RNA samples are labelled and hybridized 2 times]).
For the repeated measurements I know I should go for a paired analyses (i.e. block on SubjectID
and include this in the model), but I am actually unsure how to best utilize the info from the technical replicates....
According to some posts on this site the technical replicates could be analyzed using the duplicateCorrelation()
function according to column TechRep
(e.g. see A: Duplicate Correlation with technical replicates), whereas in other posts the use of avereps()
is recommended (e.g. see A: Including technical and biological replicates in Limma block analysis?).
Any suggestions would be appreciated.
Thanks, Guido
Part of the targets file.
> targets Filename Condition SubjectID TechRep 1 G141_B09_MUSCLE_6.CEL Ctrl 1 1 2 G141_C09_MUSCLE_7.CEL Ctrl 1 1 3 G141_D09_MUSCLE_8.CEL Ctrl 2 2 4 G141_E07_MUSCLE_1.CEL Ctrl 2 2 5 G141_F07_MUSCLE_2.CEL Ctrl 3 3 6 G141_F09_MUSCLE_10.CEL Ctrl 3 3 7 G141_G07_MUSCLE_3.CEL Ctrl 4 4 8 G141_G09_MUSCLE_11.CEL Ctrl 4 4 9 G141_H07_MUSCLE_4.CEL Ctrl 5 5 10 G141_H09_MUSCLE_12.CEL Ctrl 6 6 11 G141_B09_MUSCLE_16.CEL Treatment 1 7 12 G142_C09_MUSCLE_17.CEL Treatment 1 7 13 G142_D09_MUSCLE_18.CEL Treatment 2 8 14 G142_E07_MUSCLE_11.CEL Treatment 2 8 15 G142_F07_MUSCLE_12.CEL Treatment 3 9 16 G142_F09_MUSCLE_20.CEL Treatment 3 9 17 G142_G07_MUSCLE_13.CEL Treatment 4 10 18 G142_G09_MUSCLE_21.CEL Treatment 4 10 19 G142_H07_MUSCLE_14.CEL Treatment 5 11 20 G142_H09_MUSCLE_22.CEL Treatment 6 12 >
OK, thanks! However, since I don't fully understand, would you mind briefly explain / give a small example of what you mean with a "more complicated situation in which the technical replicates can't be averaged without losing some information about the treatments"?
Dear Gordon,
Sorry to revive this thread, but I just read a letter to the editor of Biostatistics, in which the authors implicitly claim the contrary of what you suggested: it would be 'better' to use
duplicateCorrelation()
to handle technical replicates rather than to useavereps()
(oravearrays()
). Interestingly, by doing so, they refer to a number of your papers and mailing list posts.Letter: http://dx.doi.org/10.1093/biostatistics/kxw031
The conclusion of the authors is based on the number of significant genes that were identified in a data set, but I realize that if you don't know "the truth" (as is usually the case in omics experiments) one could question the relevance of this read-out, especially since you stated using
duplicateCorrelation()
indeed leads to a liberal analysis. On the other hand, using two other methods the authors found similar numbers of significant genes, although the overlap between these lists of genes is not reported.Since the topic of how to handle technical replicates regularly pops up on this list, I would appreciate (again) your opinion [and that of others] on this.
Thanks very much in advance,
Guido
duplicateCorrelation is especially intended for cases where you want to include a batch covariate in the model but can't because it is partially confounded with the covariate of interest, resulting in a rank-deficient model. In this design, you would not want to include TechRep in the model at all, so there's no need for duplicateCorrelation.