At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat
1
1
Entering edit mode
NewBioInfo ▴ 10
@a4756c38
Last seen 3.2 years ago
Canada

I have a data matrix of 3264 by 23. The rows are genes and columns are treatments. The treatments have an unequal number of replicates done in four batches. Examples of my files with random numbers are given below. My phenom file has 4 batches (6,6,6,5) and 11 conditions. The codes below are modified for the example data.

dt.matrix

enter image description here

phenom

enter image description here

    batch1 <- phenom$batch
    mod1 <- model.matrix(~conditions, data=phenom)
    combt.p <- ComBat(dat=dt.matrix , 
                       mod=mod1 , 
                       batch=batch1 ,par.prior=TRUE, prior.plots=TRUE)


# Error output
Found4batches
Adjusting for 3 covariate(s) or covariate level(s)
Error in ComBat(dat = dt.matrix, mod = mod1, batch = batch1, par.prior = TRUE,  : 
At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat 

# Then I went into the ComBat codes and look for step-specific issues. I found that running codes below was causing issues.

    (qr(design)$rank < ncol(design))
    TRUE

#This output was supposed to give the result "FALSE". I even tried removing individual treatments and re-running the code but got the same result. Out of the 15 data matrices that I have, I got the same error for 4 matrices. I have no idea what is going wrong with my files or codes.  Please help. 

sessionInfo( )
ComBat sva • 7.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

The issue you are having is the same as trying to solve this equation:

3 = y + x

There are infinite values of x and y for which that equation is true, so you can't unambiguously solve for either. You need another equation in order to solve for the two variables. The same thing happens when you have a rank deficient design matrix - you are trying to solve for more coefficients than observations, and there is no one solution.

There is a function in the limma package, called nonEstimable that will tell you which coefficients you cannot solve for. It's often not so simple as just removing the offending coefficients, as the remaining coefficients may no longer estimate what you think they estimate, so you might consider consulting a local statistician for help.

ADD COMMENT
0
Entering edit mode

Thank you James for your response. I tried limma and seems like my data matrix was valid for full rank.

>library(limma)
>nonEstimable(dt.matrix)
NULL
>is.fullrank(dt.matrix)
TRUE

My stats knowledge is very limited, that is why I am struggling with this issue. Could you please guide me towards the right direction of what can be tested to get my data ready for comBat? Thank you.

ADD REPLY
0
Entering edit mode

The error you get from ComBat comes from the test

qr(design)$rank < ncol(design)

And inside of nonEstimable is, as one might expect

 p <- ncol(x)
    QR <- qr(x)
    if (QR$rank < p)

Where x is your design matrix. That's the exact same test! So if you were getting an error from ComBat and now you aren't getting anything from nonEstimable, the only possibility is that you are using different design matrices.

ADD REPLY
0
Entering edit mode

Thank you, James. I was indeed using the wrong matrix. The nonExtimable did give me the column name which was causing the issue but I do not know how to fix it. Simply removing the column or the entire treatment set didn't fix the issue. I have reached to a stats professor here for help.

ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6