Question: ComBat covariate bias when building centroid based predictors
gravatar for jmsendoya
13 months ago by
jmsendoya0 wrote:

Hi everyone!


I am working with microarray gene expression data to build a centroid based classifier. This is my workflow


1) In order to do this with a valid sample size, I merged multiple datasets from several platforms. Plotting a PCA revealed an obvious dataset effect. After checking that my variable of interest was balanced among subjects, ( as recommended in I decided to use ComBat for batch effect correction. Based on the after-ComBat PCA, things worked out great.


2) Then, I randomly split the data into training and test set, looked for differentially expressed genes in the training set using limma (with the model.matrix including my variable of interest plus batch info, as recommended in A: sva::ComBat without covariate of interest? and A: Method for batch correction )


3) The output genes were used to make a centroid classifier with pamr package (




When using ComBat, you can either specify a covariate (i.e. your variable of interest) or not.


If I run ComBat specifying my variable of interest as a covariate in 1) as recommended by ComBat’s authors, the classifier performs perfectly in the test set, with an acceptable number of false positives and false negatives.


However, if I run ComBat without adjusting for any covariates, the classifier sucks.


The problem is that  in a "real world sample" my variable of interest will obviously be unknown and I’ll want to predict it with my classifier, so I won’t be able to perform ComBat with that variable as a covariate for adjustment.


So, I don’t know what to do.


Any advice?


Thank you in advance!


ADD COMMENTlink written 13 months ago by jmsendoya0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 310 users visited in the last hour