ComBat covariate bias when building centroid based predictors
Entering edit mode
jmsendoya • 0
Last seen 2.5 years ago

Hi everyone!


I am working with microarray gene expression data to build a centroid based classifier. This is my workflow


1) In order to do this with a valid sample size, I merged multiple datasets from several platforms. Plotting a PCA revealed an obvious dataset effect. After checking that my variable of interest was balanced among subjects, ( as recommended in I decided to use ComBat for batch effect correction. Based on the after-ComBat PCA, things worked out great.


2) Then, I randomly split the data into training and test set, looked for differentially expressed genes in the training set using limma (with the model.matrix including my variable of interest plus batch info, as recommended in A: sva::ComBat without covariate of interest? and A: Method for batch correction )


3) The output genes were used to make a centroid classifier with pamr package (




When using ComBat, you can either specify a covariate (i.e. your variable of interest) or not.


If I run ComBat specifying my variable of interest as a covariate in 1) as recommended by ComBat’s authors, the classifier performs perfectly in the test set, with an acceptable number of false positives and false negatives.


However, if I run ComBat without adjusting for any covariates, the classifier sucks.


The problem is that  in a "real world sample" my variable of interest will obviously be unknown and I’ll want to predict it with my classifier, so I won’t be able to perform ComBat with that variable as a covariate for adjustment.


So, I don’t know what to do.


Any advice?


Thank you in advance!


combat combat sva pamr limma design matrix sva • 516 views

Login before adding your answer.

Traffic: 495 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6