Search
Question: remove microarray batch effects using Limma
0
gravatar for Xie, Zhi NIH/NHLBI [E]
6.9 years ago by
Xie, Zhi NIH/NHLBI [E]20 wrote:

Hi everyone,

I have some microarray data files containing two sets of samples in normal and disease condition. I have tested that the data also contain significant batch effects with hybridization time. However, the positive hits I obtained using the following approaches are very different (using the same cutoff value in decideTests function). I think I am supposed to use the first approach but I am surprised to see a big difference between the two approaches. could anyone help figure out the reasons?

Thanks,

Zhi Xie

NIH/NHLBI

Here eset is the expression dataset after RMA function.

Approach 1:

# Consider batch effects in the model matrix
design<-model.matrix(~0+condition.factor+batch.factor)

# fit the linear model
fit<-lmFit(eset,design)

Then I create contrast matrix and compute coefficients and errors using contrast.fit function

Approach 2:

# remove batch effects first
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor)

# only consider normal and disease conditions in the model matrix
design<-model.matrix(~0+condition.factor)

# fit the linear model
fit<-lmFit(eset.rm.batch,design)

where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

ADD COMMENTlink modified 12 days ago by Gordon Smyth31k • written 6.9 years ago by Xie, Zhi NIH/NHLBI [E]20
0
gravatar for Moshe Olshansky
6.9 years ago by
Moshe Olshansky120 wrote:

Hi Zhi,

Check whether replacing your Approach 2 by:

design<-model.matrix(~0+condition.factor)
exp.eset.rm.batch<-removeBatchEffect(exprs(eset),batch.factor,design)
fit<-lmFit(eset.rm.batch,design)

where eset.rm.batch is the expression dataset containing expression values from exp.eset.rm.batch table

produces more consistent results (i.e. call removeBatchEffect with a third argument which is your design - normal versus disease).

Regards,
Moshe.

ADD COMMENTlink modified 12 days ago by Gordon Smyth31k • written 6.9 years ago by Moshe Olshansky120

Hi Moshe, Hi Zhi,

I had some data with strong batch effects a while ago. I used several methods (e.g. adding the batch as factor in the modeling or Combat). In the end I decided to use Combat, which works quite nice. However, I wasn't aware of the removeBatchEffect() function so I just got curious. Here is a statement directly from the help page of this function: "This function is intended for use with clustering or PCA, not for use prior to linear modelling. If linear modelling is intended, it is better to include the batch effect as part of the linear model." So I guess the first approach might be the one to pick.

Cheers, Mike

ADD REPLYlink modified 12 days ago by Gordon Smyth31k • written 6.9 years ago by Mike Walter230

Thanks Moshe and Mike!

I tried Moshe's suggestion and the results from the two approaches were very consistent. I also tested the data set using Combat, the batch effects were also removed as expected.

I appreciate your prompt replies.

Zhi

ADD REPLYlink modified 12 days ago by Gordon Smyth31k • written 6.9 years ago by Xie, Zhi NIH/NHLBI [E]20
0
gravatar for Gordon Smyth
12 days ago by
Gordon Smyth31k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth31k wrote:

Approach 1 is correct (as I think you already know) and Approach 2 is wrong.

Approach 2 is wrong for two reasons. First, the batch effect and condition effects may overlap, and so removeBatchEffect() may have removed some of the condition effect with the batch effect. Second, approach 2 misleads lmFit() regarding the nature of the data and causes it to underestimate the residual degrees of freedom, for the reasons explained here: Removing continuous covariate effects in limma analysis

It may help understanding to consider a third approach:

Approach 3:

design1 <- model.matrix(~ 0 + condition.factor)
exp.eset.rm.batch <- removeBatchEffect(eset, batch.factor, design=design1)
design2 <- model.matrix(~ 0 + condition.factor + batch.factor)
fit <- lmFit(eset.rm.batch, design2)

Then Approach 3 will give identical DE results for comparisons between the conditions as does Approach 1.

ADD COMMENTlink modified 12 days ago • written 12 days ago by Gordon Smyth31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 105 users visited in the last hour