Entering edit mode
Hi Tiger,
It is completely understandable with your study design that you will
get completely different results with different covariates and that in
some cases you will run into singularity issues. I can see your your
difficulties stemming from the complex experimental design of your
study. However, please note that your question of inclusion/exclusion
of covariates is really a statistical/modeling question and not really
a ComBat question. At this time, I would strongly recommend that you
sit down with a statistician at your institution, discuss with him/her
your study design and research questions/goals, and design a linear
model (including batch) that will help you accomplish this goal. Once
you have this, as long as none of your covariates of interest are
confounded with batch, you can run your selected linear model through
ComBat to remove the batch effects as you desire (using all the non-
batch variables in your linear model as 'covariates').
Hope this helps,
Evan
On Feb 28, 2013, at 2:36 PM, hu duan wrote:
Hi Dr. Johnson,
Thanks for replying my previous Email. I have tried different
combination of covariates, they all gave different result and some
failed due to singularity. I still did not know how to choose the
right covariates. I will try to explain my question clear this time.
Please see the pdf attached for the detail analysis, including PCA.
The below figure shows all possible covariates I could choose. You can
understand group, stage, window are more and more detailed
classifications of all samples. They are dependent, not like age,
treatment group that are independent in other experiment.
<image.png>
Each sample has biological difference with others, not like technical
replicates. I want to keep all biological variance of samples and
eliminate batch effects. How should I choose covariates?
You mentioned in forum before that "Using that ComBat can estimate
what is really variance between the samples and what is due to the
batches"
<image.png>
In above figure, how can ComBat not eliminate real sample variance in
the same subset determined by covariates and only remove batch effect?
Do you assume samples in same subset will follow same distribution?
Can ComBat give a number to show how many percent of variance due to
bach effect and how many left after ComBat?
Thank you very much
Best
Tiger
The bell plot can show batch removed result. But is there a good way
to give a quantitative parameter, So we can know how ComBat perform
and can convince reviewers?
2013/2/21 Johnson, William Evan <wej@bu.edu<mailto:wej@bu.edu>>
Hey Tiger, See below:
On Feb 19, 2013, at 2:16 PM, hu duan wrote:
Hi Dr. Johnson,
I have one question about multiple covariates in ComBat analysis that
bother me for two weeks and I really need your advice.
I have a microarray experiment design list below and please look at
the attachment for details.
1. I ran experiments in 4 different dates in a randomized way, so 4
batches.
2. Each sample has been run with 2 or 3 replicates.
3. 4 parameters(individual, age, status, stage) associate with each
sample. None of them is nuisance. Batch is ONLY the effect I want to
eliminate.
Analysis I want to do:
I want to perform statistic test to find significant features between
groups by status first and then see how they behave on different
Individual, stages and ages. I will later select features to
distinguish different stages.
So I need a method to consider individual, age, status, stage at same
times when doing ComBat analysis, so that their biological difference
will not be eliminated.
My question is:
What covariates do I need to include?
You need to include them all as covariates. Make sure to include age
as a numerical covariate.
What will be the influence without including them?
If your experimental design is balanced across batches, you may not
see much of a difference. If there is some unbalance on your design
(say there are a higher proportion of treatments in on batch vs
another, or if the patients in one batch are younger than in another
batch) then you will be removing biological variation with the batch
variation. Adding covariates will ensure that the biological variation
is untouched by the batch adjustment.
Can you mention the principle of choosing multiple covariates?
No completely sure what you mean here. Again, you include any
covariates of interest so that the biological signal is not removed
during the batch adjustment. As far as which covariates to include:
anything you variables which you don't want removed from the data.
Hope this answers your questions.
Thanks
Tiger
PS:
The bell plot can show batch removed result. But is there a good way
to give a quantitative parameter, such as a percent to show how many
batch effects have been removed? So we can know how ComBat perform and
can convince reviewers?
--
Hu Duan (Tiger)
Biological Design PhD student
Graduate Research Associate
Center for Innovation in Medicine
The Biodesign Institute, Arizona State University
----------------------------------------------------------------------
-----------------
"MY MIND REBELS AT STAGNATION." -- Sherlock Holmes
----------------------------------------------------------------------
-----------------
--
Hu Duan (Tiger)
Biological Design PhD student
Graduate Research Associate
Center for Innovation in Medicine
The Biodesign Institute, Arizona State University
----------------------------------------------------------------------
-----------------
"MY MIND REBELS AT STAGNATION." -- Sherlock Holmes
----------------------------------------------------------------------
-----------------
<tiger-design matrix_020613.xlsx="">
--
Hu Duan (Tiger)
Biological Design PhD student
Graduate Research Associate
Center for Innovation in Medicine
The Biodesign Institute, Arizona State University
----------------------------------------------------------------------
-----------------
"MY MIND REBELS AT STAGNATION." -- Sherlock Holmes
----------------------------------------------------------------------
-----------------
<20130228_ComBat Covariate_Tiger.pdf>
[[alternative HTML version deleted]]