I have multiple sources of batch effects that show associations with highest principal components that I want to remove from my expression matrix for downstream analysis (clustering, pca).
I am following the sva::ComBat implementation and wanted to see if I can model more that one batch effect at the same time (I do not see an example of how I could do this on the vignette) or if I should iterate and remove each at a time as such:
> modcombat = model.matrix(~1, data=pheno) combat_edata1 =
> ComBat(dat=edata, batch=batch1, mod=modcombat, par.prior=TRUE,
> prior.plots=FALSE) combat_edata2 = ComBat(dat=combat_edata1,
> batch=batch2, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
Or should I set my model matrix with the second batch, while combat-adjusting for first batch, then set the model matrix with the first batch while combat-adjusting for the second batch as follows:
> modcombat1 = model.matrix(~batch2, data=pheno)
> combat_edata1 = ComBat(dat=edata, batch=batch1, mod=modcombat1, par.prior=TRUE, prior.plots=FALSE)
> modcombat2 = model.matrix(~batch1, data=pheno)
> combat_edata2 = ComBat(dat=combat_edata1, batch=batch2, mod=modcombat2, par.prior=TRUE, prior.plots=FALSE)
Or alternatively, look at the combination of batch1-batch2 and treat them together, which just gives the original implementation in the vignette.
I want to add that I am not using ComBat for DE analysis as I'm modeling the batch effects in my design matrix for that purpose, but only interested in producing batch-corrected values for visualization purposes.
Does removeBatchEffect work on RNA-seq? I've only used limma in microarray. For RNA-seq I have rlog values from DESeq2.
For bulk RNA-seq, you would apply
removeBatchEffects()
to the variance-stabilised or regularised log expression levels (both DESeq2), or the log [base 2] CPM expression levels (EdgeR and limma-voom), i.e., the transformed expression levels after the data has already been normalised. I provide a use-case, here: https://www.biostars.org/p/434736/#434769