Question

Removing batch effect with limma::removeBatchEffect() actually exacerbates the effect

2

Entering edit mode

drowsygoat ▴ 30

@lechkaczmarczyk-14172

Last seen 4.5 years ago

Poland

![enter image description here][1]Hello,

I am attempting to remove batch effects from my data using limma::removeBatchEffect(). I have two batches of samples, and there are four conditions. In the figures below batches are color-coded. I'm wondering why the batch effect seems stronger after applying the limma::removeBatchEffect().

The functions were running with default parameters, as follows:

      vst <- vst(dds)
      plotPCA(vst, "Sac")
      assay(vst) <- limma::removeBatchEffect(assay(vst), vst$Sac)
      plotPCA(vst, "Sac")

Before correction: Before Limma batch correction After correction: After Limma batch correction

limma deseq2 • 20k views

ADD COMMENT • link updated 6.5 years ago by Gordon Smyth 53k • written 6.5 years ago by drowsygoat ▴ 30

score 5 · Answer 1 · 2019-06-19

Two points.

First, your PCA plot does not suggest a substantial batch effect, so I wonder whether you need to worry about it.

Second, when you run removeBatchEffect you need to set the design argument so that the function knows what the four treatment conditions are. The batches are unbalanced with respect to conditions, and we only want to remove the batch effect within each condition level. For example:

design0 <- model.matrix(~condition)
assay(vst) <- removeBatchEffect(assay(vst), vst$Sac, design=design0)

Without setting the design argument, the effect you have seen is to be expected.

SVA and RUV don't seem to me to be appropriate here, because they are intended to discover the batch factor whereas you already know what it is. If you do use those algorithms, then you will have the same issue that you have with removeBatchEffect. When you do the actual batch correction, the batch correction algorithm will need to know the treatment conditions as well as the batch factor or surrogate variables.

score 3 · Answer 2 · 2019-06-19

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 days ago

United States

What has happened when you run the removeBatchEffect function is to remove shifts in the group means associated with the grouping factor you provide, per row of the matrix. It seems like the shift is not shared across the conditions. Are these really just two batches, or where the condition samples divided further?

ADD COMMENT • link 6.5 years ago Michael Love 43k

0

Entering edit mode

Many thanks for your response Michael, I appreciate that. This was RNAseq of mouse brain regions- and cell-specific RNA immunoprecipitations. Groups denote the days the mice were sacrificed. Conditions were not divided further.

Since the outliers were overlapping with the time-points in which the specimens were sacrificed, I thought it's a sound approach to treat it as a batch effect (importantly, the mice sacked later were also born later, so it should not be related to age).

Of course it may be i) a coincidence or ii) tissue preparation (experimental) artifact (e.g. lack or reproducibility in brain region dissection). If I understand correctly, the shift between those samples is inconsistent and therefore does not resemble a typical batch effect, hence the observed output of the removeBatchEffect function. Would it be a good use of time to try other tools to handle this?

If this is not a batch effect, I would hesitate between i) using the samples as they are for comparisons or ii) using only "red" ones, and tossing the "green" batch.

ADD REPLY • link 6.5 years ago drowsygoat ▴ 30

0

Entering edit mode

I might try SVA or RUV.

Another thing I would do is find a batch-y gene (via an LRT removing the batch variable) and look at plotCounts() for these genes to see if the batch effect is consistent. The important thing for DE analysis is what happens at the gene level, while the PCA is just a QC plot, to give an overview of the variation.

ADD REPLY • link 6.5 years ago Michael Love 43k