I am using limma to determine differential gene expression between healthy and KO mice. I know that there is a very strong batch effect that I have to adjust for. I read both on this forum and on _Biostar_ that it is generally recommended to include the batch effect in the limma model instead of using for example _Combat_seq_ to adjust for the batch effect.
I wanted to make sure that limma can appropriately control for the batch effect, so my idea was to fit a model with and without the the batch effect included in the design and then perform a PCA with the residuals. My reasoning was, that if limma appropriately controls for the batch effect, samples should not cluster according to batch in the PCA of the residuals.
So the first model I ran looked like this:
contrast <- c("KO-WT") d <- calcNormFactors(d, method = "TMM") mm <- model.matrix(~0 + status , data=d$samples) cmat <- makeContrasts(contrasts = contrast, levels = mm) y <- voom(d, design = mm, plot = F, normalize.method = 'quantile') fit <- lmFit(y, mm) fit <- eBayes(fit) # get residuals res <- residuals(fit, d) cfit <- contrasts.fit(fit, cmat) cfit <- eBayes(cfit) top.table <- topTable(cfit, sort.by = "P", n = Inf)
I use the absolute log transformed residuals to perform my PCA and see that the samples clearly separate on PC1 (explains 41% of variance) according to the batch they are from.
Then I ran another model were I include the batch as a covariate:
mm <- model.matrix(~0 + sample.status + batch)
I ran a PCA again using the same settings, but the plot basically stayed exactly the same. The samples still cluster according to batch along PC1 (now explains 40.9% of variance).
My question is, if my approach is just wrong or if this means that limma just cant adjust for the batch effect. If so, I would appreciate any insights you might have how I can check if limma appropriately controls for a batch effect.