I'm struggeling to find the correct approach to handling batch effects. I have a longitudinal RNA-Seq experiment where patient samples were measured in 3 batches: First the samples from the inital visit (visit 0, batch "baseline") were sequenced, some had to be re-sequenced due to QC issues (batch "baseline_reseq") and finally the follow-up samples from visits 1, 2, 3 came in and were sequenced in the third batch. Here is an example of the meta data matrix:
For now we're only looking for differentailly expressed genes between visit 0 and visit 1.
I think I should fit a mixed model using the duplicateCorrelation function to account for the batch effects like in this example.
## Limma with batch as blocking factor dge <- DGEList(counts=counts) design.test <- data.frame(group=c(rep(1,2), rep(2,2), rep(3,2)), batch=c(1,2,1,2,1,2)) design <- model.matrix(~group, design.test) dge <- calcNormFactors(dge) v <- voom(dge, design, plot=FALSE) corfit <- duplicateCorrelation(v, design, block=design.test$batch) # re-run voom & dupCorr to get a more accurate fit (recommended [here]) v <- voom(dge, design, block=design.test$batch, correlation = corfit$consensus) corfit <- duplicateCorrelation(v, design, block=design.test$batch) fit <- lmFit(v, design, block=design.test$batch, correlation=corfit$consensus) results <- topTable(fit, coef="TimepointT120", n=nrow(elist), sort.by="none")
But how do I account for the patient in this case? Would it make sense to combine patient and batch and use the combined factor as blocking factor?
Your advice is much appreciated!