I have the following experimental design, with three biological replicates per
Sample.Type performed in two batches (two different experimental dates,
> metadata Sample.ID Sample.Type Exp.Date 1 Ctrl_1 Ctrl 1 2 Ctrl_2 Ctrl 1 3 Ctrl_3 Ctrl 2 4 TxA_1 TxA 1 5 TxA_2 TxA 1 6 TxA_3 TxA 2 7 TxB_1 TxB 1 8 TxB_2 TxB 1 9 TxB_3 TxB 2 10 TxC_1 TxC 2 11 TxC_2 TxC 2 12 TxC_3 TxC 2
The aim is to perform differential expression between
Ctrl vs. TxA,
Ctrl vs. TxB, and
Ctrl vs. TxC sample types.
However, when plotting the TMM normalised data using PCA, I noticed PC2 (27% variance) was associated with the batch (shape represents
Exp.Date, and colour represents
So for differential expression analysis using edgeR, I thought it would be best to use
model.matrix(~Batch + Treatment) for the model formula, where
metadata$Exp.Date, as per the edgeR user guide Section 3.4.3 "Batch effects". However, unlike the examples in the edgeR user guide, I have the situation where a sample type is not present in every batch (i.e.
My question is, given group
TxC is only present in batch 2 (and has no samples in batch 1), is this the correct way to deal with the batch effect, given I will be testing
TxC vs. Ctrl? Or do I need to analyse
TxC samples separately, i.e. compare
TxC (n = 3) vs.
Ctrl_3 (n = 1), given this
Ctrl sample was processed in the same batch as the
My understanding was that if you want to model the batch effect, every sample type must be represented in every batch, but unsure if that's right?
Your help and advice would be greatly appreciated!
Many thanks, Rebecca
Ok good to know. Thanks so much for your help Gordon, I really appreciate it.