remove microarray batch effects using Limma
2
8
Entering edit mode
@xie-zhi-nihnhlbi-e-4320
Last seen 8.1 years ago

Hi everyone,

I have some microarray data files containing two sets of samples in normal and disease condition. I have tested that the data also contain significant batch effects with hybridization time. However, the positive hits I obtained using the following approaches are very different (using the same cutoff value in decideTests function). I think I am supposed to use the first approach but I am surprised to see a big difference between the two approaches. could anyone help figure out the reasons?

Thanks,

Zhi Xie

NIH/NHLBI

Here eset is the expression dataset after RMA function.

Approach 1:

# Consider batch effects in the model matrix
design<-model.matrix(~0+condition.factor+batch.factor)

# fit the linear model
fit<-lmFit(eset,design)


Then I create contrast matrix and compute coefficients and errors using contrast.fit function

Approach 2:

# remove batch effects first
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor)

# only consider normal and disease conditions in the model matrix
design<-model.matrix(~0+condition.factor)

# fit the linear model
fit<-lmFit(eset.rm.batch,design)


where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

Microarray • 9.4k views
4
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

Approach 1 is correct (as I think you already know) and Approach 2 is wrong.

Approach 2 is wrong for two reasons. First, the batch effect and condition effects may overlap, and so removeBatchEffect() may have removed some of the condition effect with the batch effect. Second, approach 2 misleads lmFit() regarding the nature of the data and causes it to underestimate the residual degrees of freedom, for the reasons explained here: Removing continuous covariate effects in limma analysis

It may help understanding to consider a third approach:

Approach 3:

design1 <- model.matrix(~ 0 + condition.factor)
exp.eset.rm.batch <- removeBatchEffect(eset, batch.factor, design=design1)
design2 <- model.matrix(~ 0 + condition.factor + batch.factor)
fit <- lmFit(eset.rm.batch, design2)


Then Approach 3 will give identical DE results for comparisons between the conditions as does Approach 1.

1
Entering edit mode

Dear Dr. Smyth,

As Mike pointed out, here is the NOTE on help page of removeBatchEffect: "This function is not intended to be used prior to linear modelling. For linear modelling, it is better to include the batch factors in the linear model.".

In the example above (Answer to Xie Zhi), which you named Approach 3 , you have actually used removeBatchEffect function before linear modelling. Not only that, I have been following your answers to numerous questions on BioC and elsewhere where you have equivocally stated that this function should only be considered for plots like PCA, MDS etc and not for DGE. I am sure many including myself would be in dilemma at this moment. Could you please elaborate and clear the confusion?

Thank you

Sandip

2
Entering edit mode

Yes, I wrote the NOTE, and it still seems to me to give you perfectly clear advice.

I already told you in my answer above that Approach 2, which does not use removeBatchCorrect, is correct. In Approach 3, the removeBatchEffect() step is superfluous, but doesn't do any harm because the batch factor is included in the linear model as well, which is what the NOTE on the help page tells you to do. I included the 3rd approach in my answer just to point out that it gives the same results as Approach 1 for the conditions of interest, hoping that would have pedagogic value. Obviously one wouldn't use Approach 3 is practice because it is equivalent to Approach 1 but includes an extra unnecessary step, as well as making it harder to assess the importance of the batch adjustment.

0
Entering edit mode

That's what I thought. Thanks for clarification and that too so promptly.

Sandip

0
Entering edit mode

Hi, I am a little late to this conversation. Can you build a design matrix like the removeBatchEffect does and implement into the lmfit for DE analysis

contrasts(batch) <- contr.sum(levels(batch))

batch <- model.matrix(~batch)[, -1, drop = FALSE]

design <- model.matrix(~0+treatment)

design_matrix <- cbind(design, batch)

fit <- lmfit(matrix, design_matrix)

or would it just be more acceptable to do it the standard way

design <- model.matrix(~0+treatment+batch)

fit <- lmfit(matrix, design)

Thanks for any feedback! Leanne

0
Entering edit mode
@moshe-olshansky-2329
Last seen 8.1 years ago

Hi Zhi,

Check whether replacing your Approach 2 by:

design<-model.matrix(~0+condition.factor)
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor,design)
fit<-lmFit(eset.rm.batch,design)


where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

produces more consistent results (i.e. call removeBatchEffect with a third argument which is your design - normal versus disease).

Regards,
Moshe.

0
Entering edit mode

Hi Moshe, Hi Zhi,

I had some data with strong batch effects a while ago. I used several methods (e.g. adding the batch as factor in the modeling or Combat). In the end I decided to use Combat, which works quite nice. However, I wasn't aware of the removeBatchEffect() function so I just got curious. Here is a statement directly from the help page of this function: "This function is intended for use with clustering or PCA, not for use prior to linear modelling. If linear modelling is intended, it is better to include the batch effect as part of the linear model." So I guess the first approach might be the one to pick.

Cheers, Mike

0
Entering edit mode

Thanks Moshe and Mike!

I tried Moshe's suggestion and the results from the two approaches were very consistent. I also tested the data set using Combat, the batch effects were also removed as expected.

Zhi