Search
Question: remove microarray batch effects using Limma
4
gravatar for Xie, Zhi NIH/NHLBI [E]
7.1 years ago by
Xie, Zhi NIH/NHLBI [E]60 wrote:

Hi everyone,

I have some microarray data files containing two sets of samples in normal and disease condition. I have tested that the data also contain significant batch effects with hybridization time. However, the positive hits I obtained using the following approaches are very different (using the same cutoff value in decideTests function). I think I am supposed to use the first approach but I am surprised to see a big difference between the two approaches. could anyone help figure out the reasons?

Thanks,

Zhi Xie

NIH/NHLBI

Here eset is the expression dataset after RMA function.

Approach 1:

# Consider batch effects in the model matrix
design<-model.matrix(~0+condition.factor+batch.factor)

# fit the linear model
fit<-lmFit(eset,design)

Then I create contrast matrix and compute coefficients and errors using contrast.fit function

Approach 2:

# remove batch effects first
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor)

# only consider normal and disease conditions in the model matrix
design<-model.matrix(~0+condition.factor)

# fit the linear model
fit<-lmFit(eset.rm.batch,design)

where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

ADD COMMENTlink modified 5 weeks ago by Gordon Smyth32k • written 7.1 years ago by Xie, Zhi NIH/NHLBI [E]60
1
gravatar for Gordon Smyth
10 weeks ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Approach 1 is correct (as I think you already know) and Approach 2 is wrong.

Approach 2 is wrong for two reasons. First, the batch effect and condition effects may overlap, and so removeBatchEffect() may have removed some of the condition effect with the batch effect. Second, approach 2 misleads lmFit() regarding the nature of the data and causes it to underestimate the residual degrees of freedom, for the reasons explained here: Removing continuous covariate effects in limma analysis

It may help understanding to consider a third approach:

Approach 3:

design1 <- model.matrix(~ 0 + condition.factor)
exp.eset.rm.batch <- removeBatchEffect(eset, batch.factor, design=design1)
design2 <- model.matrix(~ 0 + condition.factor + batch.factor)
fit <- lmFit(eset.rm.batch, design2)

Then Approach 3 will give identical DE results for comparisons between the conditions as does Approach 1.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Gordon Smyth32k

Dear Dr. Smyth,

As Mike pointed out, here is the NOTE on help page of removeBatchEffect: "This function is not intended to be used prior to linear modelling. For linear modelling, it is better to include the batch factors in the linear model.". 

In the example above (Answer to Xie Zhi), which you named Approach 3 , you have actually used removeBatchEffect function before linear modelling. Not only that, I have been following your answers to numerous questions on BioC and elsewhere where you have equivocally stated that this function should only be considered for plots like PCA, MDS etc and not for DGE. I am sure many including myself would be in dilemma at this moment. Could you please elaborate and clear the confusion?

Thank you

Sandip

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Sandip Darji0
1

In Approach 3, the removeBatchEffect() step is superfluous, but doesn't do any harm because the batch factor is included in the linear model as well, which is what the NOTE on the help page tells you to do. I included the 3rd approach in my answer just to point out that it gives the same results as Approach 1 for the conditions of interest, hoping that would have pedagogic value. Obviously one wouldn't use Approach 3 is practice because it is equivalent to Approach 1 but includes an extra unnecessary step, as well as making it harder to assess the importance of the batch adjustment.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Gordon Smyth32k

That's what I thought. Thanks for clarification and that too so promptly.

Sandip

ADD REPLYlink written 5 weeks ago by Sandip Darji0
0
gravatar for Moshe Olshansky
7.1 years ago by
Moshe Olshansky120 wrote:

Hi Zhi,

Check whether replacing your Approach 2 by:

design<-model.matrix(~0+condition.factor)
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor,design)
fit<-lmFit(eset.rm.batch,design)

where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

produces more consistent results (i.e. call removeBatchEffect with a third argument which is your design - normal versus disease).

Regards,
Moshe.

ADD COMMENTlink modified 10 weeks ago by Gordon Smyth32k • written 7.1 years ago by Moshe Olshansky120

Hi Moshe, Hi Zhi,

I had some data with strong batch effects a while ago. I used several methods (e.g. adding the batch as factor in the modeling or Combat). In the end I decided to use Combat, which works quite nice. However, I wasn't aware of the removeBatchEffect() function so I just got curious. Here is a statement directly from the help page of this function: "This function is intended for use with clustering or PCA, not for use prior to linear modelling. If linear modelling is intended, it is better to include the batch effect as part of the linear model." So I guess the first approach might be the one to pick.

Cheers, Mike

ADD REPLYlink modified 10 weeks ago by Gordon Smyth32k • written 7.1 years ago by Mike Walter230

Thanks Moshe and Mike!

I tried Moshe's suggestion and the results from the two approaches were very consistent. I also tested the data set using Combat, the batch effects were also removed as expected.

I appreciate your prompt replies.

Zhi

ADD REPLYlink modified 10 weeks ago by Gordon Smyth32k • written 7.1 years ago by Xie, Zhi NIH/NHLBI [E]60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 198 users visited in the last hour