At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat
Entering edit mode
NewBioInfo • 0
Last seen 24 days ago

I have a data matrix of 3264 by 23. The rows are genes and columns are treatments. The treatments have an unequal number of replicates done in four batches. Examples of my files with random numbers are given below. My phenom file has 4 batches (6,6,6,5) and 11 conditions. The codes below are modified for the example data.


enter image description here


enter image description here

    batch1 <- phenom$batch
    mod1 <- model.matrix(~conditions, data=phenom)
    combt.p <- ComBat(dat=dt.matrix , 
                       mod=mod1 , 
                       batch=batch1 ,par.prior=TRUE, prior.plots=TRUE)

# Error output
Adjusting for 3 covariate(s) or covariate level(s)
Error in ComBat(dat = dt.matrix, mod = mod1, batch = batch1, par.prior = TRUE,  : 
At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat 

# Then I went into the ComBat codes and look for step-specific issues. I found that running codes below was causing issues.

    (qr(design)$rank < ncol(design))

#This output was supposed to give the result "FALSE". I even tried removing individual treatments and re-running the code but got the same result. Out of the 15 data matrices that I have, I got the same error for 4 matrices. I have no idea what is going wrong with my files or codes.  Please help. 

sessionInfo( )
ComBat sva • 227 views
Entering edit mode
Last seen 1 day ago
United States

The issue you are having is the same as trying to solve this equation:

3 = y + x

There are infinite values of x and y for which that equation is true, so you can't unambiguously solve for either. You need another equation in order to solve for the two variables. The same thing happens when you have a rank deficient design matrix - you are trying to solve for more coefficients than observations, and there is no one solution.

There is a function in the limma package, called nonEstimable that will tell you which coefficients you cannot solve for. It's often not so simple as just removing the offending coefficients, as the remaining coefficients may no longer estimate what you think they estimate, so you might consider consulting a local statistician for help.

Entering edit mode

Thank you James for your response. I tried limma and seems like my data matrix was valid for full rank.


My stats knowledge is very limited, that is why I am struggling with this issue. Could you please guide me towards the right direction of what can be tested to get my data ready for comBat? Thank you.

Entering edit mode

The error you get from ComBat comes from the test

qr(design)$rank < ncol(design)

And inside of nonEstimable is, as one might expect

 p <- ncol(x)
    QR <- qr(x)
    if (QR$rank < p)

Where x is your design matrix. That's the exact same test! So if you were getting an error from ComBat and now you aren't getting anything from nonEstimable, the only possibility is that you are using different design matrices.

Entering edit mode

Thank you, James. I was indeed using the wrong matrix. The nonExtimable did give me the column name which was causing the issue but I do not know how to fix it. Simply removing the column or the entire treatment set didn't fix the issue. I have reached to a stats professor here for help.


Login before adding your answer.

Traffic: 267 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6