Question: How to remove more than two batches in my data-sets
3
3.0 years ago by
yi.huang30
yi.huang30 wrote:

Hi all,

I've used RemoveBatchEffect function from limma package for my previous datasets but it's limited to two batches only.

I'm working with a data-sets that contains more than two batches. How can I do?

Many thanks,

Yi

modified 3.0 years ago by Gordon Smyth37k • written 3.0 years ago by yi.huang30
Answer: How to remove more than two batches in my data-sets
6
3.0 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

removeBatchEffects() is not limited to two batches. It works just the same with any number of batches. For example, if you have three batches (A, B, C) you might use:

batch <- c("A","A","A","B","B","B","C","C","C")
​removeBatchEffect(y, batch)

or

removeBatchEffect(y, batch, design=design)

Strangely, the same question has been asked before: Batch effect removal

Edit 24 hours later:

If your need is actually to handle more than 2 batch factors, rather than just more than 2 batches, then (as suggested by Steve) this can be achieved using the covariates argument to removeBatchEffects. Suppose you have three batch factors:

contrasts(batch1) <- contr.sum(levels(batch1))
contrasts(batch2) <- contr.sum(levels(batch2))
contrasts(batch3) <- contr.sum(levels(batch3))
covariates <- model.matrix(~batch1+batch2+batch3)
covariates <- covariates[,-1]

Then you can correct by

removeBatchEffect(y, covariates=covariates, design=design)
1

My 2 cents:

The (common) confusion that you're pointing out is likely due to the easy-to-understand description of the batch and batch2 parameters, and the (bit) cryptic description of the covariate parameter.

Presumably you mean that one could control for an arbitrary number of batches by creating a covariates design matrix that encodes these batches, but the average lay person will have no intuition on how to do that.

Furthermore, there are no examples in ?removeBatchEffect that can help shed light on the situation other than the example using the single batch parameter.

No, that isn't what I mean at all. I think you are confusing batch-factors with batches, and perhaps that is OP's misunderstanding as well.

removeBatchEffects() can handle only two batch factors, but each factor can have an arbitrary number of levels, just like any factor in any R function. Each level corresponds to a batch. So removeBatchEffects() naturally handles an arbitrary number of batches even without the batch2 or covariate arguments.

The batch2 and covariate arguments are for more complex situations where there is an additive structure of batch effects from multiple sources.

There we have it: dollars to donuts it's a terminology thing, then, as you point out.

My bet is that the OP is asking about how to control for > 2 batch factors (since that's how I interpreted the question ;-), since it's pretty straightforward to see (and try to test) how the batch and batch2 parameters can have > 2 unique categorical values (levels) ... but let's see.

ADD REPLYlink modified 3.0 years ago by Gordon Smyth37k • written 3.0 years ago by Steve Lianoglou12k

I bet they only have one factor. (Later: but I was wrong.)

1

In service to my fellow mere mortals who will be trying to grok what contr.sum  is doing in Gordon's updated answer, you can start by reading through this tutorial on contrast coding schemes for categorical variables.

Hi Gordon and Steve,

Thanks for your replies. Sorry for poor description of my question.

What I mean is that there are more than two batch "factors".

For example, how can I handle batch "factors" while the design is  ~ batch1+batch2+batch3+factor1+factor2?

The removeBatchEffect is limited to "batch2".