Entering edit mode
Ramon Diaz
★
1.1k
@ramon-diaz-159
Last seen 10.3 years ago
Dear Naomi,
>
> If you treat the biological subject as a random effect, then the
mixed
> model ANOVA tests of all the treatment and other effects are
identical to
> what you get if you average the technical reps (within treatment).
(This
> assumes a balanced dye-swap design, with no missing observations.)
>
> If you run lme on the experiment, and set everything up correctly,
this is
> what you will get and the df will be the same.
Sure we get the same estimates of effects and variances if, as you
say, we
have "balanced dye-swap design, with no missing observations", and
there are
no problems with convergence. However, I think I was seeing problems
with
convergence because of the tiny sample sizes.
As for df: I think there is no consensus there when there is unbalance
and
even some times when there is balance). Pinheiro & Bates, in their
book,
explain how they compute df I think in ch. 2 of their book (I don't
have it
here). SAS PROC MIXED contains several different ways of going about
the df
business (I think they are explained in the SAS documentation and the
"SAS
system for mixed models" book by Wolfinger, Stroup, Little, Milliken
---order
of authors might be wrong; I have the book at home). And
randomizedBlock
seems to do something different from the above.
Before analyzing microarray data, I never worried too much about these
issues
because the differences were generally inconsequential. But, with
these tiny
sample sizes, the consequences are not always inconsequential IF,
despite
admonitions, your coauthors insist (often because editors of funders
require
it) on using p-values for establishing a threshold.
> After struggling with this (and huge amounts of imbalance due to bad
spots,
> unequal spot replication, etc) I am pretty much settling for doing
the
> averaging (over all technical reps) before sending my data to the
ANOVA
> routine. (I should do a weighted analysis is account for this, but
so far
> I have not.)
I would appreciate if you can let me know of any ideas about how to do
the
weighting, and whether or not you find it makes a large difference.
To do this, I have to do a lot of slow
> preprocessing. However, because I am working in a non-medical
environment
> in which the number of replicates is generally small, the number of
> treatments is generally large, and investigator patience is
reasonably
> good, this seems to be the best solution for our group. The biggest
> draw-back is that to date I have not fully automated the procedure,
so I
> cannot off-load the data-processing to a research assistant (or
publish
> it to this list).
I understand it can be slow because of sheer number of arrays and
genes, etc,
but for me at least code is just a few lines. This is probably
unneded, but
this is what I do to average tech. reps:
### we have 6 arrays, 3 biol. reps. and 2 tech. reps per biol. rep.
data1 ### the data (log ratios of normalized data); columns are arrays
### each pair of consecutive columns corresponds to the 2 tech. reps.
dye.swap.codes <- c(-1, 1, -1, 1, -1, 1)
## "positivize" or set in all the control as denominator in the log
ratio.
data1.p <- t(t(data1) * dye.swap.codes)
tech.rep <- factor(c(1, 1, 2, 2, 3, 3))
data1.p.mean <- t(apply(data1.p, 1, function(x) tapply(x, tech.rep,
mean,
na.rm = TRUE))) ## notice brute force handling of missings
## Fit the lm (i.e., do the t-test)
lfit.data1 <- lm.series(data1.p.mean)
> Regarding randomized blocks:
>
> We have a RCB design if each biological sample gets all of the
treatments
> (e.g. cancer cells and normal cells from the same subject;
different
> organs from the same mouse). In this case, the "error" for testing
> treatment effects is biological sample * treatment, and the
technical
> replicates appear as "pure error". The d.f. should be correct. If
I have
> time, I will look carefully at your design and model over the
weekend to
> see if I can determine why you are not getting the right d.f.
I might have messed up, but for the examples I used, it is a RCB if we
look at
the effect of dye (the dye-swap is done within subject; subject, the
biol.
rep. is the "whole plot error", whereas each of the two tech. reps.
are the
"split plots"). But I don't care about effects of dye here.
>From the help of randomizedBlock, and the fact that it is analogous
to a
lme(y ~ fixed, random = ~ 1|subjects),
I understand if can be used here.
In the simple setting I was using, the lme model would have been y ~ 1
(or
y ~ -1 + dye.swap.codes, if we do not set all controls in denominator,
but
some have it in numerator and some in denominator).
In fact, the outcome from randomizedBlock (estimate, standard error of
estimate), as it should be, is identical to a t-test of the means of
tech
replicates. They only differ w.r.t. dfs.
>
> I hope that this is helpful, and apologize for not giving this all
of the
> attention needed.
It certainly is helpful. Thanks a lot for your comments.
Best,
R.
>
> --Naomi
>
> At 10:49 AM 3/31/2004, Ramon Diaz-Uriarte wrote:
> >Dear Gordon, Naomi, and BioC list,
> >
> >The issue of how to deal with technical replicates (such as those
obtained
> >when we do dye-swaps of the same biological samples in cDNA arrays)
has
> > come up in the BioC list several times. What follows is a short
summary,
> > with links to mails in BioC plus some questions/comments.
> >
> >
> >There seem to be three major ways of approaching the issue:
> >
> >
> >1. Treat the technical replicates as ordinary replicates
> >*************************************************************
> >E.g., Gordon Smyth in sept. 2003
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-Septembe
r/00240
> >5.html)
> >
> >However, this makes me (and Naomi Altman ---e.g.,
> >https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-December/
003340.
> >html)
> >
> >uneasy (tech. reps. are not independent biological reps. which
leads to
> > the usual inflation of dfs and deflation of se).
> >
> >I guess part of the key to Gordon's suggestion is his comment that
even if
> >the
> >s.e. are slightly underestimated, the ordering is close to being
the
> > optimal one. But I don't see why the ordering out to be much worse
if we
> > use the means of technical replicates as in 3. below. (Haven't
done the
> > math, but it seems that, specially in the pressence of strong
tech. reps.
> > covariance and small number of independent samples we ought to be
better
> > of using the means of the tech. reps).
> >
> >
> >2. Mixed effects models with subject as random effect (e.g., via
lme).
> >*******************************************************************
*******
> >****
> >
> >In late August of 2003 I asked about these issues, and Gordon
seemed to
> > agree that trying the lme approach could be a way to go.
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-August/0
02224.h
> >tml).
> >
> >However, in September, I posted an aswer and included code that, at
least
> > for some cases, shows potential problems with using lme when the
number
> > of technical replicates is small.
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-Septembe
r/00243
> >0.html)
> >
> >Nevertheless, Naomi Altman reports using lme/mixed models in a
couple of
> >emails
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-December
/003191
> >.html;
> >
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-January/
003481.
> >html).
> >
> >After reading about randomizedBlock (package statmod) in a message
in BioC
> > (I think from Gordon), I have tried aggain the mixed models
approach,
> > since with tech. reps and no other random effects, we should be
able to
> > use
> >randomizedBlock. Details in 5. below:
> >
> >
> >3. Take the average of the technical replicates
> >****************************************************
> >Except for being possibly conservative (and not estimating tech.
reps.
> >variance component), I think this is a "safe" procedure.
> >This is what I have ended up doing routinely after my disappointing
tries
> >with
> >lme
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-Septembe
r/00243
> >0.html) and what Bill Kenworthy seemed to end up doing
> >(https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-January/
003493.
> >html).
> >
> >I think this is also what is done at least some times in literature
(e.g.,
> >Huber et al., 2002, Bioinformatics, 18, S96--S104 [the vsn paper]).
> >
> >*********
> >
> >4. Dealing with replicates in future versions of limma
> >***********************************************************
> >
> >Now, in Sept. 2004 Gordon mentioned that an explicit treatment of
tech.
> > reps. would be in a future version of limma
> >(
> >https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-September
/002411
> >.html) and I wonder if Gordon meant via mixed-effects models, or
some
> > other way, or if there has been some progress in this area.
> >
> >
> >
> >5. Using randomizedBlock
> >*****************************
> >In a simple set up of control and treatment with dye-swaps, I have
done
> > some numerical comparisons of the outcome of a t-test on the mean
of the
> > technical replicates with lme and with randomizedBlock. [The
function is
> > attached]. The outcome of the t-test of the means of replicates
and
> > randomizedBlock, in terms of the t-statistic, tends to be the same
(if we
> > "positivize" the dye swaps). The only difference, then, lies in
the df we
> > then use to put a p-value on the statistic. But I don't see how we
can
> > use the dfs from randomizedBlock: they seem way too large. Where
am I
> > getting lost?
> >
> >
> >Best,
> >
> >
> >R.
> >
> >
> >
> >--
> >Ram?n D?az-Uriarte
> >Bioinformatics Unit
> >Centro Nacional de Investigaciones Oncol?gicas (CNIO)
> >(Spanish National Cancer Center)
> >Melchor Fern?ndez Almagro, 3
> >28029 Madrid (Spain)
> >Fax: +-34-91-224-6972
> >Phone: +-34-91-224-6900
> >
> >http://bioinfo.cnio.es/~rdiaz
> >PGP KeyID: 0xE89B3462
> >(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
> >
> >
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor@stat.math.ethz.ch
> >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
(Statistics)
> University Park, PA 16802-2111
--
Ram?n D?az-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)