ComBat: should the variable of interest be included into model matrix
2
2
Entering edit mode
maren.lang ▴ 10
@marenlang-7070
Last seen 9.3 years ago
Germany

Dear All,

I have a question regarding combat. I am unsure about what to include into the model matrix (mod).

Do I have to insert only those variables that I want to correct for or also the response variable?

 

If I call the help function for Combat, I get the information :

 

ComBat(dat, batch, mod, numCovs = NULL, par.prior = TRUE,

  prior.plots = FALSE)

 

mod

Model matrix for outcome of interest and other covariates besides batch

 

But in the sva package, http://www.bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf

under 6. Applying the ComBat function to adjust for known batches

you can read:

“Just as with sva, we then need to create a model matrix for the adjustment variables, but do not

include the variable of interest. Note that you do not include batch in creating this model matrix - it

will be included later in ComBat function. In this case there are no other adjustment variables so we

simply fit an intercept term.

> modcombat = model.matrix(~1, data=pheno)”

 

It would be great if you could help me with that problem.

Thank you very much, kindest regards, Maren

 

ComBat • 3.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

See here:

sva::ComBat without covariate of interest?

ADD COMMENT
0
Entering edit mode
@w-evan-johnson-5447
Last seen 6 months ago
United States

Maren, 

Sorry for the confusion on this. We have been having discussions on how handle covariates in two-step procedures: e.g. (step 1) batch adjustment, followed by (step 2) significance testing. 

The proper way to handle a two step batch/significance test is as follows: 

    - Step 1: Adjust for batch with ComBat and include any adjustment variables, including the covariate of interest.

    - Step 2: Use a modified F or T-test for significance. For example: 

        - The F-test should consist of a modified F statistic=((rss0 - rss1)/(df1 - df0))/(rss1/(n - df1 -  nbatches)), where rss0 is the reduced model residual sum of squared error (SSE), rss1 is the full model SSE, df0 and df1 are the numbers of parameters in the reduced and full models, and nbatches is the number of batches. This should be compared against an F distribution with  df1 - df0 and n - df1 - nbatches degrees of freedom. 

Publications in the literature discussing this issue are forthcoming and we will be changing the sva documentation to reflect this.

Thanks!

Evan

ADD COMMENT

Login before adding your answer.

Traffic: 1038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6