ComBat: should the variable of interest be included into model matrix
Entering edit mode
maren.lang ▴ 10
Last seen 6.2 years ago

Dear All,

I have a question regarding combat. I am unsure about what to include into the model matrix (mod).

Do I have to insert only those variables that I want to correct for or also the response variable?


If I call the help function for Combat, I get the information :


ComBat(dat, batch, mod, numCovs = NULL, par.prior = TRUE,

  prior.plots = FALSE)



Model matrix for outcome of interest and other covariates besides batch


But in the sva package,

under 6. Applying the ComBat function to adjust for known batches

you can read:

“Just as with sva, we then need to create a model matrix for the adjustment variables, but do not

include the variable of interest. Note that you do not include batch in creating this model matrix - it

will be included later in ComBat function. In this case there are no other adjustment variables so we

simply fit an intercept term.

> modcombat = model.matrix(~1, data=pheno)”


It would be great if you could help me with that problem.

Thank you very much, kindest regards, Maren


ComBat • 2.8k views
Entering edit mode
Last seen 5 hours ago
United States

See here:

sva::ComBat without covariate of interest?

Entering edit mode
Last seen 18 months ago
United States


Sorry for the confusion on this. We have been having discussions on how handle covariates in two-step procedures: e.g. (step 1) batch adjustment, followed by (step 2) significance testing. 

The proper way to handle a two step batch/significance test is as follows: 

    - Step 1: Adjust for batch with ComBat and include any adjustment variables, including the covariate of interest.

    - Step 2: Use a modified F or T-test for significance. For example: 

        - The F-test should consist of a modified F statistic=((rss0 - rss1)/(df1 - df0))/(rss1/(n - df1 -  nbatches)), where rss0 is the reduced model residual sum of squared error (SSE), rss1 is the full model SSE, df0 and df1 are the numbers of parameters in the reduced and full models, and nbatches is the number of batches. This should be compared against an F distribution with  df1 - df0 and n - df1 - nbatches degrees of freedom. 

Publications in the literature discussing this issue are forthcoming and we will be changing the sva documentation to reflect this.




Login before adding your answer.

Traffic: 299 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6