Possible bug in sva package regarding ComBat and use of numCovs
1
0
Entering edit mode
@djie-tjwan-thung-5053
Last seen 10.2 years ago
Dear list and authors of SVA / ComBat, I might have stumbled upon a little bug in the ComBat function: ComBat(dat, batch, mod, numCovs=NULL, par.prior=TRUE,prior.plots=FALSE) mod should contain a model matrix for variables for outcome of interest and other covariates besides batch, including numerical covariates. However it seems that all numerical covariates should be in the last columns of the model matrix and all categorical covariates in the first columns of the model matrix. For example: mod <- model.matrix(~as.factor(pheno$gender) + pheno$age) #using this mod variable as argument to ComBat works modSwitch <- model.matrix(~pheno$age + as.factor(pheno$gender)) #using this modSwitch variable as argument to ComBat doesn't work: Error in solve.default(t(design) %*% design) : system is computationally singular: reciprocal condition number = 4.60793e-21 I've had a look into the package source files and I think the design.mat function contains the problem, regarding this code: #In this code ncov is number of categorical covariates and mod[,-tmp] is a matrix without batch column and without intercept if(ncov>0){ for (j in 1:ncov){ tmp1 <- as.factor(as.matrix(mod[,-tmp])[,j]) design <- build.design(tmp1,des=design) } } So I guess by looping from 1 till the number of categorical covariates, you assume these categorical variables should be at the beginning of the model matrix. Could someone verify this? I'd be willing to give some example data when needed. Maybe the documentation for the ComBat function could be updated by clearly stating this or the function could be improved? Regards, Djie Thung Bioinformatics Intern - Dept. of Human Genetics UMC Utrecht [[alternative HTML version deleted]]
Genetics sva Genetics sva • 2.5k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.8 years ago
United States
Djie, Thanks for pointing this out. It appears that as the software is designed you do need to order the categorical variables first in the mod argument. We will put out a new version of sva that addresses this issue and improves the speed of the fsva functions within the week. Thanks so much for your help! Jeff On Tue, Feb 14, 2012 at 5:37 PM, Djie Tjwan Thung <djie.thung@gmail.com>wrote: > Dear list and authors of SVA / ComBat, > > I might have stumbled upon a little bug in the ComBat function: > > ComBat(dat, batch, mod, numCovs=NULL, par.prior=TRUE,prior.plots=FALSE) > > mod should contain a model matrix for variables for outcome of interest and > other covariates besides batch, including numerical covariates. > However it seems that all numerical covariates should be in the last > columns of the model matrix and all categorical covariates in the first > columns of the model matrix. > For example: > > mod <- model.matrix(~as.factor(pheno$gender) + pheno$age) #using this mod > variable as argument to ComBat works > modSwitch <- model.matrix(~pheno$age + as.factor(pheno$gender)) #using this > modSwitch variable as argument to ComBat doesn't work: > > Error in solve.default(t(design) %*% design) : > system is computationally singular: reciprocal condition number = > 4.60793e-21 > > I've had a look into the package source files and I think the design.mat > function contains the problem, regarding this code: > #In this code ncov is number of categorical covariates and mod[,-tmp] is a > matrix without batch column and without intercept > if(ncov>0){ > for (j in 1:ncov){ > tmp1 <- as.factor(as.matrix(mod[,-tmp])[,j]) > design <- build.design(tmp1,des=design) > } > } > > So I guess by looping from 1 till the number of categorical covariates, you > assume these categorical variables should be at the beginning of the model > matrix. > Could someone verify this? I'd be willing to give some example data when > needed. > Maybe the documentation for the ComBat function could be updated by clearly > stating this or the function could be improved? > > Regards, > Djie Thung > Bioinformatics Intern - Dept. of Human Genetics UMC Utrecht > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Great, thanks for updating! Djie 2012/2/15 Jeff Leek <jtleek@gmail.com> > Djie, > > Thanks for pointing this out. It appears that as the software is designed > you do need to order the categorical variables first in the mod argument. > We will put out a new version of sva that addresses this issue and improves > the speed of the fsva functions within the week. > > Thanks so much for your help! > > Jeff > > On Tue, Feb 14, 2012 at 5:37 PM, Djie Tjwan Thung <djie.thung@gmail.com>wrote: > >> Dear list and authors of SVA / ComBat, >> >> I might have stumbled upon a little bug in the ComBat function: >> >> ComBat(dat, batch, mod, numCovs=NULL, par.prior=TRUE,prior.plots=FALSE) >> >> mod should contain a model matrix for variables for outcome of interest >> and >> other covariates besides batch, including numerical covariates. >> However it seems that all numerical covariates should be in the last >> columns of the model matrix and all categorical covariates in the first >> columns of the model matrix. >> For example: >> >> mod <- model.matrix(~as.factor(pheno$gender) + pheno$age) #using this mod >> variable as argument to ComBat works >> modSwitch <- model.matrix(~pheno$age + as.factor(pheno$gender)) #using >> this >> modSwitch variable as argument to ComBat doesn't work: >> >> Error in solve.default(t(design) %*% design) : >> system is computationally singular: reciprocal condition number = >> 4.60793e-21 >> >> I've had a look into the package source files and I think the design.mat >> function contains the problem, regarding this code: >> #In this code ncov is number of categorical covariates and mod[,-tmp] is a >> matrix without batch column and without intercept >> if(ncov>0){ >> for (j in 1:ncov){ >> tmp1 <- as.factor(as.matrix(mod[,-tmp])[,j]) >> design <- build.design(tmp1,des=design) >> } >> } >> >> So I guess by looping from 1 till the number of categorical covariates, >> you >> assume these categorical variables should be at the beginning of the model >> matrix. >> Could someone verify this? I'd be willing to give some example data when >> needed. >> Maybe the documentation for the ComBat function could be updated by >> clearly >> stating this or the function could be improved? >> >> Regards, >> Djie Thung >> Bioinformatics Intern - Dept. of Human Genetics UMC Utrecht >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6