sva ComBat implementation for multiple sources of batch effects
3
0
Entering edit mode
Zach Roe ▴ 10
@zach-roe-11189
Last seen 2.1 years ago

I have multiple sources of batch effects that show associations with highest principal components that I want to remove from my expression matrix for downstream analysis (clustering, pca).

I am following the sva::ComBat implementation and wanted to see if I can model more that one batch effect at the same time (I do not see an example of how I could do this on the vignette) or if I should iterate and remove each at a time as such:

> modcombat = model.matrix(~1, data=pheno) combat_edata1 =
> ComBat(dat=edata, batch=batch1, mod=modcombat, par.prior=TRUE,
> prior.plots=FALSE) combat_edata2 = ComBat(dat=combat_edata1,
> batch=batch2, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)


Or should I set my model matrix with the second batch, while combat-adjusting for first batch, then set the model matrix with the first batch while combat-adjusting for the second batch as follows:

> modcombat1 = model.matrix(~batch2, data=pheno)
> combat_edata1 = ComBat(dat=edata, batch=batch1, mod=modcombat1, par.prior=TRUE, prior.plots=FALSE)

> modcombat2 = model.matrix(~batch1, data=pheno)
> combat_edata2 = ComBat(dat=combat_edata1, batch=batch2, mod=modcombat2, par.prior=TRUE, prior.plots=FALSE)


Or alternatively, look at the combination of batch1-batch2 and treat them together, which just gives the original implementation in the vignette.

I want to add that I am not using ComBat for DE analysis as I'm modeling the batch effects in my design matrix for that purpose, but only interested in producing batch-corrected values for visualization purposes.

sva combat batch-adjustment • 492 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 23 minutes ago
United States

If you just need the data for visualization purposes you should probably use removeBatchEffect in limma, which will emulate what you are doing within your linear model.

0
Entering edit mode

Does removeBatchEffect work on RNA-seq? I've only used limma in microarray. For RNA-seq I have rlog values from DESeq2.

2
Entering edit mode

For bulk RNA-seq, you would apply removeBatchEffects() to the variance-stabilised or regularised log expression levels (both DESeq2), or the log [base 2] CPM expression levels (EdgeR and limma-voom), i.e., the transformed expression levels after the data has already been normalised. I provide a use-case, here: https://www.biostars.org/p/434736/#434769

2
Entering edit mode
@w-evan-johnson-5447
Last seen 2.0 years ago
United States

Zach, if you are doing RNA-seq, you should try ComBat_seq. Its available in the most recent Bioconductor release. If you have a strong batch effect, the methodology mitigates the exaggerated significance problem so we recommend that it can be used for differential expression (which incidentally, removeBatchEffect doesn’t mitigate exaggerated significance in the case of strong variance batch effects).

1
Entering edit mode
@w-evan-johnson-5447
Last seen 2.0 years ago
United States

The publication for ComBat_seq is pending, but here is the Bioxiv: https://www.biorxiv.org/content/10.1101/2020.01.13.904730v1

0
Entering edit mode

Hi Evan, thank you. I'm also working with DNA methylation data, and my question above is specific to that. Do you have recommendation when there are more than 2 batch sources if I can implement ComBat twice? Each time correcting for one type of batch effect?

In my case I have both 450K and EPIC array data. I first implement ComBat in each platform to correct for array batches. I then merged the two platforms, and then want to implement ComBat again to correct for platform as batch.