6 days ago by
Cambridge, United Kingdom
As Gordon suggests, the diversity of possible designs makes it difficult to suggest a hard-and-fast rule. Nonetheless, here are some thoughts:
Technical replicates: If these are generated by literally sequencing the same sample multiple times (e.g., on different lanes), just add them together and treat the resulting sum as a single sample.
Not-quite-technical replicates: These are usually things like "we took multiple samples from the same donor", so they're not fully fledged biological replicates but they aren't totally technical either. In most cases, I would just add them together and move on because I don't care about capturing the variability within levels of the blocking factor. For example, if biopsies are variable within a patient but the average expression across multiple biopsies is consistent across patients, then the latter is all I care about. ~~On the other hand, if I did expect the repeated samples to be similar, I would want to penalize genes that exhibit variation between them, so I'd like to capture that variation with
duplicateCorrelation.~~ (Update: see comment below.)
Also, when adding, it is better that each repeated sample contributes evenly to the sum for a particular blocking level; this gives you a more stable sum and thus lower across-level variance. It may also be wise to use
voomWithQualityWeights to adjust for differences in the number of repeated samples per donor.
Repeated samples with different uninteresting predictors: This refers to situations where repeated samples do not have the same set of predictors in the design matrix, e.g., because some repeated samples were processed in a different batch. If the repeated samples for each blocking level have the same pattern of values for those predictors (e.g., each blocking level has one repeated sample in each of three batches), summation is still possible. However, in general, this is not the case and then
duplicateCorrelation must be used.
Repeated samples with different interesting predictors: This refers to situations where repeated samples do not have the same set of predictors in the design matrix, because those predictors are interesting and their effects are to be tested. The archetypical example would be to collect samples before and after treatment for each patient. Here, we can either use
duplicateCorrelation or we can block on the uninteresting factors in the design matrix. I prefer the latter as it avoids a few assumptions of the former, namely that all genes have the same consensus correlation. (There's also an assumption about the distribution of the random effect, but I can't remember what it was - maybe normal i.i.d.) However,
duplicateCorrelation is more general and is the only solution when you want to compare across blocking levels, e.g., comparing diseased and healthy donors when each donor also contributes before/after treatment samples.
modified 5 days ago
6 days ago by
Aaron Lun • 25k