Search
Question: remove microarray batch effects using Limma
4
8.0 years ago by
Xie, Zhi NIH/NHLBI [E]60 wrote:

Hi everyone,

I have some microarray data files containing two sets of samples in normal and disease condition. I have tested that the data also contain significant batch effects with hybridization time. However, the positive hits I obtained using the following approaches are very different (using the same cutoff value in decideTests function). I think I am supposed to use the first approach but I am surprised to see a big difference between the two approaches. could anyone help figure out the reasons?

Thanks,

Zhi Xie

NIH/NHLBI

Here eset is the expression dataset after RMA function.

Approach 1:

# Consider batch effects in the model matrix
design<-model.matrix(~0+condition.factor+batch.factor)

# fit the linear model
fit<-lmFit(eset,design)


Then I create contrast matrix and compute coefficients and errors using contrast.fit function

Approach 2:

# remove batch effects first
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor)

# only consider normal and disease conditions in the model matrix
design<-model.matrix(~0+condition.factor)

# fit the linear model
fit<-lmFit(eset.rm.batch,design)


where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

modified 12 months ago by Gordon Smyth35k • written 8.0 years ago by Xie, Zhi NIH/NHLBI [E]60
1
13 months ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

Approach 1 is correct (as I think you already know) and Approach 2 is wrong.

Approach 2 is wrong for two reasons. First, the batch effect and condition effects may overlap, and so removeBatchEffect() may have removed some of the condition effect with the batch effect. Second, approach 2 misleads lmFit() regarding the nature of the data and causes it to underestimate the residual degrees of freedom, for the reasons explained here: Removing continuous covariate effects in limma analysis

It may help understanding to consider a third approach:

Approach 3:

design1 <- model.matrix(~ 0 + condition.factor)
exp.eset.rm.batch <- removeBatchEffect(eset, batch.factor, design=design1)
design2 <- model.matrix(~ 0 + condition.factor + batch.factor)
fit <- lmFit(eset.rm.batch, design2)


Then Approach 3 will give identical DE results for comparisons between the conditions as does Approach 1.

Dear Dr. Smyth,

As Mike pointed out, here is the NOTE on help page of removeBatchEffect: "This function is not intended to be used prior to linear modelling. For linear modelling, it is better to include the batch factors in the linear model.".

In the example above (Answer to Xie Zhi), which you named Approach 3 , you have actually used removeBatchEffect function before linear modelling. Not only that, I have been following your answers to numerous questions on BioC and elsewhere where you have equivocally stated that this function should only be considered for plots like PCA, MDS etc and not for DGE. I am sure many including myself would be in dilemma at this moment. Could you please elaborate and clear the confusion?

Thank you

Sandip

1

In Approach 3, the removeBatchEffect() step is superfluous, but doesn't do any harm because the batch factor is included in the linear model as well, which is what the NOTE on the help page tells you to do. I included the 3rd approach in my answer just to point out that it gives the same results as Approach 1 for the conditions of interest, hoping that would have pedagogic value. Obviously one wouldn't use Approach 3 is practice because it is equivalent to Approach 1 but includes an extra unnecessary step, as well as making it harder to assess the importance of the batch adjustment.

That's what I thought. Thanks for clarification and that too so promptly.

Sandip

0
8.0 years ago by
Moshe Olshansky120 wrote:

Hi Zhi,

Check whether replacing your Approach 2 by:

design<-model.matrix(~0+condition.factor)
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor,design)
fit<-lmFit(eset.rm.batch,design)


where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

produces more consistent results (i.e. call removeBatchEffect with a third argument which is your design - normal versus disease).

Regards,
Moshe.

Hi Moshe, Hi Zhi,

I had some data with strong batch effects a while ago. I used several methods (e.g. adding the batch as factor in the modeling or Combat). In the end I decided to use Combat, which works quite nice. However, I wasn't aware of the removeBatchEffect() function so I just got curious. Here is a statement directly from the help page of this function: "This function is intended for use with clustering or PCA, not for use prior to linear modelling. If linear modelling is intended, it is better to include the batch effect as part of the linear model." So I guess the first approach might be the one to pick.

Cheers, Mike

ADD REPLYlink modified 13 months ago by Gordon Smyth35k • written 8.0 years ago by Mike Walter230

Thanks Moshe and Mike!

I tried Moshe's suggestion and the results from the two approaches were very consistent. I also tested the data set using Combat, the batch effects were also removed as expected.