Entering edit mode
Chol-hee,
So I think I now better understand your issue. Why do you have
individual ID in the model as a covariate? Do you have repeated
measures in your dataset? If not, you should NOT be including
individual ID in as a covariate. If it is a repeated measures
experiment, then is seems that individual is nested within your
treatments/covariates (do you only have one individual ID per
treatment combination?). Also, note that you should be treating age as
a numerical covariate.
I would strongly advise to you that you go meet with a statistician
with experience in linear models and discuss with her/him what
covariates should be included in your model (and why!). It seems that
your question is more than just a ComBat question but rather a design
and linear modeling question. Any good statistician should be able to
help you with this.
Thanks!
Evan
On Oct 26, 2012, at 12:46 AM, Cholhee Jung wrote:
> Dear Evan,
>
> Thank you for your reply.
>
> So, my understanding is that you suggest my first attempt as the
more
> appropriate way to run ComBat than the second attempt.
> However, removing covariate one after another until I don't get the
> singularity error spared just one covariate.
>
> While the only remaining covariate is the most important one (as it
> is the individual ID), I wonder if it's really OK to lose all other
> covariate information, such as sample preparation methods, age,
> disease etc, for the technical artefact removal.
>
> I'm just worried about losing actual signals coming up between these
> covariates by ComBat.
>
> Regards,
> Chol-hee
>
>
> On Fri, Oct 26, 2012 at 12:19 AM, W. Evan Johnson <wej at="" bu.edu="">
wrote:
>> Chol-hee,
>>
>> Notice the simple example:
>>
>>> x=as.factor(c(1,0,1,0));y=as.factor(c(1,2,1,2));z=rnorm(4)
>>
>> Notice that x and y are the same covariate. Now:
>>
>>> design=model.matrix(z~x+y)
>>> design
>> (Intercept) x1 y2
>> 1 1 1 0
>> 2 1 0 1
>> 3 1 1 0
>> 4 1 0 1
>> attr(,"assign")
>> [1] 0 1 2
>> attr(,"contrasts")
>> attr(,"contrasts")$x
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$y
>> [1] "contr.treatment"
>>
>>> solve(t(design)%*%design)
>> Error in solve.default(t(design) %*% design) :
>> Lapack routine dgesv: system is exactly singular
>>
>> You get the singularity error because your covariates are exactly
the same
>> (or not linearly independent). Now if you concatenate the variables
like you
>> did:
>>
>>> xy=as.factor(paste(x,y,sep='.'))
>>> xy
>> [1] 1.1 0.2 1.1 0.2
>> Levels: 0.2 1.1
>>
>> Which is clearly different than the original x+y. Now:
>>
>>> design=model.matrix(z~xy)
>>> design
>> (Intercept) xy1.1
>> 1 1 1
>> 2 1 0
>> 3 1 1
>> 4 1 0
>> attr(,"assign")
>> [1] 0 1
>> attr(,"contrasts")
>> attr(,"contrasts")$xy
>> [1] "contr.treatment"
>>
>>> solve(t(design)%*%design)
>> (Intercept) xy1.1
>> (Intercept) 0.5 -0.5
>> xy1.1 -0.5 1.0
>>
>> Which now works. However, note that I wouldn't use the latter. You
need to
>> find out which of your six covariates are linearly dependent with
each other
>> and remove one or more so they are NOT linearly dependent. This
will be
>> different from your second attempt but will be equivalent to what
you were
>> trying to accomplish in your first attempt
>>
>> Let me know if this doesn't work!
>>
>> Thanks!
>>
>> Evan
>>
>> --
>> W. Evan Johnson
>> Assistant Professor
>> Division of Computational Biomedicine
>> Boston University School of Medicine
>> 72 East Concord St., E-645
>> Boston, MA 02118
>> Phone: (617) 638-2541
>>
>>
>>
>> On Oct 25, 2012, at 6:00 AM, bioconductor-request at r-project.org
wrote:
>>
>> ------------------------------
>>
>> Message: 8
>> Date: Wed, 24 Oct 2012 13:40:33 -0400
>> From: Achilleas Pitsillides <anp4r at="" virginia.edu="">
>> To: Cholhee Jung <jung.cholhee at="" gmail.com="">, Bioconductor mailing
list
>> <bioconductor at="" stat.math.ethz.ch="">
>> Subject: Re: [BioC] Fwd: covariate information
>> Message-ID:
>> <cabdy-6=dp5-wu+zqo4-upmbpx51zecrpqcwdsv5ygyhg2z821a at="" mail.gmail.com="">
>> Content-Type: text/plain
>>
>> Dear Chol-hee,
>>
>> The short answer is that the two model matrices are different and
they have
>> different dimensions; you can verify this by using the dim(mod_mat)
to see
>> the dimension of the model matrix.
>>
>> Here is my understanding: If you have a factor f1 with 3 levels
and a
>> factor f2 with 2 levels (where all the possible level combinations
exist),
>> then f1:f2 is all the combinations of f1 and f2 (an equivalent
factor
>> with 6 levels) and the model matrix ~f1:f2 would have six columns
(i.e.
>> fit a model with 6 coefficients).
>> However, the model matrix ~f1+f2 will have 4 columns ( i.e. fit 4
>> coefficients: constant, two for f1 and one for f2).
>> The model ~f1*f2 will fit 6 coefficients and have a model matrix
with the
>> same column space as the model matrix for ~f1:f2.
>>
>>
>> I hope this helps,
>>
>> cheers,
>> Achilleas
>>
>>
>> On Tue, Oct 23, 2012 at 10:10 PM, Cholhee Jung
>> <jung.cholhee at="" gmail.com="">wrote:
>>
>>
>>
>> Dear users,
>>
>>
>> Below is the question I posted originally in the 'ComBat user
forum' of
>>
>> Google group.
>>
>> But, as I was suggested to forward my question to Bioconductor
mailing
>>
>> list, I'm doing it now.
>>
>>
>> Please find my question, below.
>>
>>
>>
>> Regards,
>>
>> Chol-hee
>>
>>
>> On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung
wrote:
>>
>>
>>
>>
>> Dear users,
>>
>>
>> I was trying ComBat on ~1,000 samples.
>>
>> Samples are spread over 12 batches and each batch contains 4
technical
>>
>> replicates that are identical across all batches.
>>
>> The number of covariates is 5, and I was using the ComBat
implemented in
>>
>> the 'sva' package.
>>
>>
>> I tried ComBat with two model matrix built from the same covariate
>>
>> information.
>>
>>
>> First model matrix was constructed as below:
>>
>> mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) +
>>
>> as.factor(cov3) + as.factor(cov4) + as.factor(cov5),
data=pheno_data )
>>
>>
>> Second one was built as below:
>>
>> mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1,
>>
>> pheno_data$cov2, pheno_data$cov3, pheno_data$cov4, pheno_data$cov5,
>>
>> sep=":")))
>>
>>
>> Basically, covariates were concatenated into one string for the the
>>
>> second model matrix.
>>
>>
>> ComBat with the first model matrix raised the 'singular' error like
>>
>> below:
>>
>>
>> Error in solve.default(t(design) %*% design) :
>>
>> Lapack routine dgesv: system is exactly singular
>>
>>
>> But, ComBat run without error with the second model matrix.
>>
>>
>>
>> Now I wonder if the two different model matrices are same?
>>
>>
>> Regards,
>>
>> Chol-hee
>>
>>
>>
>> _______________________________________________
>>
>> Bioconductor mailing list
>>
>> Bioconductor at r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Search the archives:
>>
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>