Adjust for multiple batches using combat
1
0
Entering edit mode
juls • 0
@juls-11275
Last seen 5 weeks ago
Austria

Dear all,

I have patient data (microarray data > 100 samples, very noisy) - and as always there are many factors (disease/control, infection, age, sex, treatment, cohort, pmi, batch/scandate).

So my question is basically a generell question concerning combat. I am interested in two biological variables - disease/control and infection. Should/can I correct for the others by running combat sequentially? If yes, what about the order? This does influence the outcome.

Or is multiple batch correction basically overfitting the data?

Thank you very much!

Best,

Julia

combat sva limma • 1.6k views
7
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 17 hours ago
The city by the bay

I'll describe what I would do for removeBatchEffect - I imagine that a similar procedure could be used for ComBat. Basically, I would throw in every uninteresting factor into an additive formula in model.matrix, and use the resulting matrix as input to the covariates argument. At the same time, I would use the interesting factors to construct a normal design matrix and supply that as design. Running removeBatchEffect should then yield an expression matrix with the uninteresting factors regressed out. You might get a warning about unestimable coefficients for some of the terms in the uninteresting design matrix - this can be ignored.

Obviously, this assumes that the uninteresting factors are not confounded with your interesting ones. If they are confounding, the warning I mention above will be relevant, as either no correction will be performed (if qr chooses to pivot out the uninteresting factor) or the effect of the interesting factor will be regressed out with the uninteresting one. I can also imagine that the correction would not be stable if you have few residual degrees of freedom, i.e., you get overfitting because the number of factors is close to the number of samples. In such cases, more selectivity would be required when choosing the uninteresting factors to regress out.

0
Entering edit mode

I have done something similar to what you described.
Combat cannot take multiple batched at the same time - so I used it sequentially giving it a batch to correct for and a design matrix with my interesting factor plus covariates (uninteresting factors) - I omitted the uninteresting factor I have corrected for in the next step in the design matrix until I only had the interesting factors left.
But thanks! I will give removeBatchEffect a try.

3
Entering edit mode

Iterative regression is probably a bad idea if the uninteresting factors are not all orthogonal to the interesting ones. Earlier iterations would not include relevant factors in the model, resulting in biased estimation of the terms that were included. This becomes an issue if you try to use the biased coefficients for regression. The variance in earlier iterations would also be inflated by the effects of missing factors, which might interfere with ComBat's empirical Bayes shrinkage.

0
Entering edit mode

Thanks again! It was suggested previously as a solution to Combat not being able to handle multiple batches, but I did not feel very confident about it.