covariate information
0
0
Entering edit mode
@w-evan-johnson-5447
Last seen 6 months ago
United States
Chol-hee, So I think I now better understand your issue. Why do you have individual ID in the model as a covariate? Do you have repeated measures in your dataset? If not, you should NOT be including individual ID in as a covariate. If it is a repeated measures experiment, then is seems that individual is nested within your treatments/covariates (do you only have one individual ID per treatment combination?). Also, note that you should be treating age as a numerical covariate. I would strongly advise to you that you go meet with a statistician with experience in linear models and discuss with her/him what covariates should be included in your model (and why!). It seems that your question is more than just a ComBat question but rather a design and linear modeling question. Any good statistician should be able to help you with this. Thanks! Evan On Oct 26, 2012, at 12:46 AM, Cholhee Jung wrote: > Dear Evan, > > Thank you for your reply. > > So, my understanding is that you suggest my first attempt as the more > appropriate way to run ComBat than the second attempt. > However, removing covariate one after another until I don't get the > singularity error spared just one covariate. > > While the only remaining covariate is the most important one (as it > is the individual ID), I wonder if it's really OK to lose all other > covariate information, such as sample preparation methods, age, > disease etc, for the technical artefact removal. > > I'm just worried about losing actual signals coming up between these > covariates by ComBat. > > Regards, > Chol-hee > > > On Fri, Oct 26, 2012 at 12:19 AM, W. Evan Johnson <wej at="" bu.edu=""> wrote: >> Chol-hee, >> >> Notice the simple example: >> >>> x=as.factor(c(1,0,1,0));y=as.factor(c(1,2,1,2));z=rnorm(4) >> >> Notice that x and y are the same covariate. Now: >> >>> design=model.matrix(z~x+y) >>> design >> (Intercept) x1 y2 >> 1 1 1 0 >> 2 1 0 1 >> 3 1 1 0 >> 4 1 0 1 >> attr(,"assign") >> [1] 0 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$x >> [1] "contr.treatment" >> >> attr(,"contrasts")$y >> [1] "contr.treatment" >> >>> solve(t(design)%*%design) >> Error in solve.default(t(design) %*% design) : >> Lapack routine dgesv: system is exactly singular >> >> You get the singularity error because your covariates are exactly the same >> (or not linearly independent). Now if you concatenate the variables like you >> did: >> >>> xy=as.factor(paste(x,y,sep='.')) >>> xy >> [1] 1.1 0.2 1.1 0.2 >> Levels: 0.2 1.1 >> >> Which is clearly different than the original x+y. Now: >> >>> design=model.matrix(z~xy) >>> design >> (Intercept) xy1.1 >> 1 1 1 >> 2 1 0 >> 3 1 1 >> 4 1 0 >> attr(,"assign") >> [1] 0 1 >> attr(,"contrasts") >> attr(,"contrasts")$xy >> [1] "contr.treatment" >> >>> solve(t(design)%*%design) >> (Intercept) xy1.1 >> (Intercept) 0.5 -0.5 >> xy1.1 -0.5 1.0 >> >> Which now works. However, note that I wouldn't use the latter. You need to >> find out which of your six covariates are linearly dependent with each other >> and remove one or more so they are NOT linearly dependent. This will be >> different from your second attempt but will be equivalent to what you were >> trying to accomplish in your first attempt >> >> Let me know if this doesn't work! >> >> Thanks! >> >> Evan >> >> -- >> W. Evan Johnson >> Assistant Professor >> Division of Computational Biomedicine >> Boston University School of Medicine >> 72 East Concord St., E-645 >> Boston, MA 02118 >> Phone: (617) 638-2541 >> >> >> >> On Oct 25, 2012, at 6:00 AM, bioconductor-request at r-project.org wrote: >> >> ------------------------------ >> >> Message: 8 >> Date: Wed, 24 Oct 2012 13:40:33 -0400 >> From: Achilleas Pitsillides <anp4r at="" virginia.edu=""> >> To: Cholhee Jung <jung.cholhee at="" gmail.com="">, Bioconductor mailing list >> <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] Fwd: covariate information >> Message-ID: >> <cabdy-6=dp5-wu+zqo4-upmbpx51zecrpqcwdsv5ygyhg2z821a at="" mail.gmail.com=""> >> Content-Type: text/plain >> >> Dear Chol-hee, >> >> The short answer is that the two model matrices are different and they have >> different dimensions; you can verify this by using the dim(mod_mat) to see >> the dimension of the model matrix. >> >> Here is my understanding: If you have a factor f1 with 3 levels and a >> factor f2 with 2 levels (where all the possible level combinations exist), >> then f1:f2 is all the combinations of f1 and f2 (an equivalent factor >> with 6 levels) and the model matrix ~f1:f2 would have six columns (i.e. >> fit a model with 6 coefficients). >> However, the model matrix ~f1+f2 will have 4 columns ( i.e. fit 4 >> coefficients: constant, two for f1 and one for f2). >> The model ~f1*f2 will fit 6 coefficients and have a model matrix with the >> same column space as the model matrix for ~f1:f2. >> >> >> I hope this helps, >> >> cheers, >> Achilleas >> >> >> On Tue, Oct 23, 2012 at 10:10 PM, Cholhee Jung >> <jung.cholhee at="" gmail.com="">wrote: >> >> >> >> Dear users, >> >> >> Below is the question I posted originally in the 'ComBat user forum' of >> >> Google group. >> >> But, as I was suggested to forward my question to Bioconductor mailing >> >> list, I'm doing it now. >> >> >> Please find my question, below. >> >> >> >> Regards, >> >> Chol-hee >> >> >> On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung wrote: >> >> >> >> >> Dear users, >> >> >> I was trying ComBat on ~1,000 samples. >> >> Samples are spread over 12 batches and each batch contains 4 technical >> >> replicates that are identical across all batches. >> >> The number of covariates is 5, and I was using the ComBat implemented in >> >> the 'sva' package. >> >> >> I tried ComBat with two model matrix built from the same covariate >> >> information. >> >> >> First model matrix was constructed as below: >> >> mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) + >> >> as.factor(cov3) + as.factor(cov4) + as.factor(cov5), data=pheno_data ) >> >> >> Second one was built as below: >> >> mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1, >> >> pheno_data$cov2, pheno_data$cov3, pheno_data$cov4, pheno_data$cov5, >> >> sep=":"))) >> >> >> Basically, covariates were concatenated into one string for the the >> >> second model matrix. >> >> >> ComBat with the first model matrix raised the 'singular' error like >> >> below: >> >> >> Error in solve.default(t(design) %*% design) : >> >> Lapack routine dgesv: system is exactly singular >> >> >> But, ComBat run without error with the second model matrix. >> >> >> >> Now I wonder if the two different model matrices are same? >> >> >> Regards, >> >> Chol-hee >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> [[alternative HTML version deleted]] >> >> >
GO GO • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6