Fwd: covariate information
2
0
Entering edit mode
Cholhee Jung ▴ 10
@cholhee-jung-5571
Last seen 10.3 years ago
Dear users, Below is the question I posted originally in the 'ComBat user forum' of Google group. But, as I was suggested to forward my question to Bioconductor mailing list, I'm doing it now. Please find my question, below. Regards, Chol-hee On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung wrote: > > > > Dear users, > > I was trying ComBat on ~1,000 samples. > Samples are spread over 12 batches and each batch contains 4 technical > replicates that are identical across all batches. > The number of covariates is 5, and I was using the ComBat implemented in > the 'sva' package. > > I tried ComBat with two model matrix built from the same covariate > information. > > First model matrix was constructed as below: > >mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) + > as.factor(cov3) + as.factor(cov4) + as.factor(cov5), data=pheno_data ) > > Second one was built as below: > >mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1, > pheno_data$cov2, pheno_data$cov3, pheno_data$cov4, pheno_data$cov5, > sep=":"))) > > Basically, covariates were concatenated into one string for the the > second model matrix. > > ComBat with the first model matrix raised the 'singular' error like below: > > Error in solve.default(t(design) %*% design) : > Lapack routine dgesv: system is exactly singular > > But, ComBat run without error with the second model matrix. > > > Now I wonder if the two different model matrices are same? > > Regards, > Chol-hee > >
• 1.2k views
ADD COMMENT
0
Entering edit mode
@achilleas-pitsillides-4316
Last seen 10.3 years ago
Dear Chol-hee, The short answer is that the two model matrices are different and they have different dimensions; you can verify this by using the dim(mod_mat) to see the dimension of the model matrix. Here is my understanding: If you have a factor f1 with 3 levels and a factor f2 with 2 levels (where all the possible level combinations exist), then f1:f2 is all the combinations of f1 and f2 (an equivalent factor with 6 levels) and the model matrix ~f1:f2 would have six columns (i.e. fit a model with 6 coefficients). However, the model matrix ~f1+f2 will have 4 columns ( i.e. fit 4 coefficients: constant, two for f1 and one for f2). The model ~f1*f2 will fit 6 coefficients and have a model matrix with the same column space as the model matrix for ~f1:f2. I hope this helps, cheers, Achilleas On Tue, Oct 23, 2012 at 10:10 PM, Cholhee Jung <jung.cholhee@gmail.com>wrote: > > > Dear users, > > Below is the question I posted originally in the 'ComBat user forum' of > Google group. > But, as I was suggested to forward my question to Bioconductor mailing > list, I'm doing it now. > > Please find my question, below. > > > Regards, > Chol-hee > > On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung wrote: > > > > > > > > Dear users, > > > > I was trying ComBat on ~1,000 samples. > > Samples are spread over 12 batches and each batch contains 4 technical > > replicates that are identical across all batches. > > The number of covariates is 5, and I was using the ComBat implemented in > > the 'sva' package. > > > > I tried ComBat with two model matrix built from the same covariate > > information. > > > > First model matrix was constructed as below: > > >mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) + > > as.factor(cov3) + as.factor(cov4) + as.factor(cov5), data=pheno_data ) > > > > Second one was built as below: > > >mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1, > > pheno_data$cov2, pheno_data$cov3, pheno_data$cov4, pheno_data$cov5, > > sep=":"))) > > > > Basically, covariates were concatenated into one string for the the > > second model matrix. > > > > ComBat with the first model matrix raised the 'singular' error like > below: > > > > Error in solve.default(t(design) %*% design) : > > Lapack routine dgesv: system is exactly singular > > > > But, ComBat run without error with the second model matrix. > > > > > > Now I wonder if the two different model matrices are same? > > > > Regards, > > Chol-hee > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@w-evan-johnson-5447
Last seen 6 months ago
United States
Chol-hee, Notice the simple example: > x=as.factor(c(1,0,1,0));y=as.factor(c(1,2,1,2));z=rnorm(4) Notice that x and y are the same covariate. Now: > design=model.matrix(z~x+y) > design (Intercept) x1 y2 1 1 1 0 2 1 0 1 3 1 1 0 4 1 0 1 attr(,"assign") [1] 0 1 2 attr(,"contrasts") attr(,"contrasts")$x [1] "contr.treatment" attr(,"contrasts")$y [1] "contr.treatment" > solve(t(design)%*%design) Error in solve.default(t(design) %*% design) : Lapack routine dgesv: system is exactly singular You get the singularity error because your covariates are exactly the same (or not linearly independent). Now if you concatenate the variables like you did: > xy=as.factor(paste(x,y,sep='.')) > xy [1] 1.1 0.2 1.1 0.2 Levels: 0.2 1.1 Which is clearly different than the original x+y. Now: > design=model.matrix(z~xy) > design (Intercept) xy1.1 1 1 1 2 1 0 3 1 1 4 1 0 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$xy [1] "contr.treatment" > solve(t(design)%*%design) (Intercept) xy1.1 (Intercept) 0.5 -0.5 xy1.1 -0.5 1.0 Which now works. However, note that I wouldn't use the latter. You need to find out which of your six covariates are linearly dependent with each other and remove one or more so they are NOT linearly dependent. This will be different from your second attempt but will be equivalent to what you were trying to accomplish in your first attempt Let me know if this doesn't work! Thanks! Evan -- W. Evan Johnson Assistant Professor Division of Computational Biomedicine Boston University School of Medicine 72 East Concord St., E-645 Boston, MA 02118 Phone: (617) 638-2541 On Oct 25, 2012, at 6:00 AM, bioconductor-request@r-project.org wrote: > ------------------------------ > > Message: 8 > Date: Wed, 24 Oct 2012 13:40:33 -0400 > From: Achilleas Pitsillides <anp4r@virginia.edu> > To: Cholhee Jung <jung.cholhee@gmail.com>, Bioconductor mailing list > <bioconductor@stat.math.ethz.ch> > Subject: Re: [BioC] Fwd: covariate information > Message-ID: > <cabdy-6=dp5-wu+zqo4-upmbpx51zecrpqcwdsv5ygyhg2z821a@mail.gmail.com> > Content-Type: text/plain > > Dear Chol-hee, > > The short answer is that the two model matrices are different and they have > different dimensions; you can verify this by using the dim(mod_mat) to see > the dimension of the model matrix. > > Here is my understanding: If you have a factor f1 with 3 levels and a > factor f2 with 2 levels (where all the possible level combinations exist), > then f1:f2 is all the combinations of f1 and f2 (an equivalent factor > with 6 levels) and the model matrix ~f1:f2 would have six columns (i.e. > fit a model with 6 coefficients). > However, the model matrix ~f1+f2 will have 4 columns ( i.e. fit 4 > coefficients: constant, two for f1 and one for f2). > The model ~f1*f2 will fit 6 coefficients and have a model matrix with the > same column space as the model matrix for ~f1:f2. > > > I hope this helps, > > cheers, > Achilleas > > > On Tue, Oct 23, 2012 at 10:10 PM, Cholhee Jung <jung.cholhee@gmail.com>wrote: > >> >> >> Dear users, >> >> Below is the question I posted originally in the 'ComBat user forum' of >> Google group. >> But, as I was suggested to forward my question to Bioconductor mailing >> list, I'm doing it now. >> >> Please find my question, below. >> >> >> Regards, >> Chol-hee >> >> On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung wrote: >>> >>> >>> >>> Dear users, >>> >>> I was trying ComBat on ~1,000 samples. >>> Samples are spread over 12 batches and each batch contains 4 technical >>> replicates that are identical across all batches. >>> The number of covariates is 5, and I was using the ComBat implemented in >>> the 'sva' package. >>> >>> I tried ComBat with two model matrix built from the same covariate >>> information. >>> >>> First model matrix was constructed as below: >>>> mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) + >>> as.factor(cov3) + as.factor(cov4) + as.factor(cov5), data=pheno_data ) >>> >>> Second one was built as below: >>>> mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1, >>> pheno_data$cov2, pheno_data$cov3, pheno_data$cov4, pheno_data$cov5, >>> sep=":"))) >>> >>> Basically, covariates were concatenated into one string for the the >>> second model matrix. >>> >>> ComBat with the first model matrix raised the 'singular' error like >> below: >>> >>> Error in solve.default(t(design) %*% design) : >>> Lapack routine dgesv: system is exactly singular >>> >>> But, ComBat run without error with the second model matrix. >>> >>> >>> Now I wonder if the two different model matrices are same? >>> >>> Regards, >>> Chol-hee >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6