Entering edit mode
Renaud Gaujoux
▴
170
@renaud-gaujoux-3125
Last seen 10.3 years ago
Hi,
I've got a microarray dataset (Illumina) coming from a blood assay
with
a case-control factor of interest.
I also have several other covariates (gender, weight, etc...).
I know that the experimental design is highly unbalanced with respect
to
Gender:
female male
control 12 7
case 7 17
Therefore, if there is a Gender effect, then it really needs to be
included into any subsequent analysis (differential expression with
limma, classifications). I do not want to find differences between
cases-controls that are actually due to Gender.
Some questions around that:
- what would be the "best practice" way of find if the Gender (or any
other covariates) actually has an effect that needs to be dealt with
(as
I would rather not bother about it).
What I did: run limma on ~ Status + Gender, looking at the q-values
for
Gender (?)
- one part of the genes claims for a Gender effect, whereas the other
part doesn't. In that case is it a good thing to include the Gender
for
all? Can we use two different models? What about the multiple testing
correction in that case?
- supposing we decide to take into account the gender in the analysis,
do you know classification methods that enables to include some
correction for a covariate (I cannot correct my original data for
gender
without including the case-control status, because I think would then
remove a lot of the effect of interest (cf. unbalanced design).
Therefore, I need to cross-validated any gender-correction if I do not
want to bias the classification result. This increase the complexity
of
the classification methods, as well as reducing the actual choice of
the
method, since not all method give access to the internal machinery
(cf.
Random Forest: can I hook the splitting method to use a gender-
corrected
split?)
- any other suggestion to deal properly with this kind of very
annoying
unbalanced design?
Thanks for your help and comments.