I have been using Combat to correct the batch effect in 450k data (~10 batches). Recently, I read an old reply from Dr. Peter Langfelder where he mentioned that "ComBat should NOT be used before running association testing (lmFit); association testing should be run with batch as a covariate on original data."
I have read the paper from Combat authors where they mentioned that Combat performs better than SVD when sample size is small and comparably similar for large sample size.
Now, my question is since my sample size is large and I am using limma for calculating differential methylation, should I be adjusting batch directly in limma as a covariate (or use removeBatchEffect(), not sure about this, as this will again be like removing batch separately)
batch <- pheno$batch BSC <- pheno$BSC_batch group <- factor(targets$status,levels=c("Control","Case")) design <- model.matrix(~targets$Age+batch+BSC+group) fit <- lmFit(Mval, design)
or should I continue with Combat for removing batch and limma for adjusting other biological covariates?
MvalC <- ComBat(Mval, batch=batch, mod=NULL, par.prior = TRUE, prior.plots = FALSE) modcombat <- model.matrix(~target$Age, data=pheno) MvalC1 <- ComBat(MvalC, batch=BSC, mod=NULL, par.prior = TRUE, prior.plots = FALSE) design <- model.matrix(~targets$Age+group) fit <- lmFit(MvalC1, design)
(I corrected for Chip batch and BSC batch separately, as they were confounded and limma was showing error.)