Thank you very much for the reply. It was quite enlightening and I
with almost everything. Particularly the idea that the is no
collecting enough data to have the power to see that changes that you
looking for. Nevertheless, it will be along time before we can get
from analyzing small datasets (3 vs 3 for example). It is often
to perform a small study in order to get preliminary data for a larger
In fact, in most cases this would be advisable in order to get an idea
technical and biological variability prior to designing the larger
Consequently, it is important to be able to analyze small datasets.
Suppose that we have a large dataset from a study with two
conditions (100 vs 100). Assume that there are large, reproducible
differences (many fold, many standard deviations) between the
a number of genes (1-5% of the data). A T-test can be used on this
to define a group of differentially expressed genes. Select 3 samples
random from each group and use two statistical tests, a T-test and the
Bayesian T-test implemented in Cyber-T. At any significance cut-off,
genes found to be differentially expressed by the Bayesian T-test will
much better agreement with the genes found by a T-test from the 100
than the regular T-test will be.
I think that this is at odds with your conclusion.
----- Original Message -----
From: "Baker, Stephen" <email@example.com>
Sent: Tuesday, December 16, 2003 5:45 PM
Subject: RE: [BioC] ttest or fold change
> Garrett et al,
> The t-test (or ANOVA) does not have a problem with "accidentally too
> small" variances, either with one or more than one outcome of
> The estimate of the error variance by t-tests and ANOVA is a Least
> Squares estimate and is the UNBIASED ESTIMATOR that is also the
> bound on the variance for the "best" (minimum variance) linear
> estimator (BLUE) of the effects being tested (see Graybill 1976).
> Some bayesian methods can generate smaller estimates of variances by
> biasing the estimate toward some overall measure such as the average
> variances for nearby genes. These are BIASED estimates based on an
> assumption that a particular gene should really be like genes that
> "nearby" in some sense, such as they have similar expression levels.
> You would have to present a lot of data to me to convince me that
> randomly selected gene should have a variance like some other set of
> genes, especially when I have an unbiased estimate at hand that is
> non-controversial, requires no defense, and uses methods that have
> withstood 100 years of review and scrutiny. I'm familiar with
> estimates of effects that can have a smaller "mean squared error",
> these are random effects, not variances which control the power and
> I error rate.
> These approaches, in addition to producing biased estimates
> require the analyst to impose his or her own particular biases,
> "prior beliefs" or "priors" on as to how much these estimates should
> biased by requiring that the analyst input how much weight is given
> the data from that gene and how much weight is given to the other
> that the gene is supposed to "be more like". Again, it would take
> pretty strong arguments to convince me that any particular analysts
> prior beliefs about how much the data for a gene or data from other
> genes should or should not be weighted. I would be concerned about
> much convincing a readership, reviewer, or study group would need if
> they ever decide to "open the black box" and ask me to explain why
> an approach is reasonable/justifiable.
> The program Garrett mentioned, Cyber-T, uses such an approach. To
> the Cyber-T manual "...This weighting factor IS CONTROLLED BY THE
> EXPERIMENTER AND WILL DEPEND ON HOW CONFIDENT THE EXPERIMENTER IS
> the background variance of a closely related set of genes
> the variance of the gene under consideration". Now if one was
> at just ONE gene, it makes sense that someone might put a lot of
> thought into it, have looked at a lot of similar genes or other data
> come to the conclusion that a gene should be like some other genes
> THEN use this approach. But this is not the case when you have
> or 22,000 genes, at least not in the world I'm familiar with.
> I use empirical bayes methods for fitting general linear mixed
> where the priors are objective, not my own opinion. Cyber-T does
> the option of setting low confidence in the prior which is an
> prior, but the manual points out that this results in the standard
> Student t-test! Another feature of Cyber-T is that when you have
> "enough" data, the weighted approach converges into the standard
> as well.
> The real problem that researchers face with microarrays is NOT that
> their t-test variances are too small, but that they often have
> insufficient sample to detect the differences they need to detect.
> ready solution is to get enough data.
> -.- -.. .---- .--. ..-.
> Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625
> Sr. Biostatistician- Information Services
> Lecturer in Biostatistics (775) 254-4885 fax
> Graduate School of Biomedical Sciences
> University of Massachusetts Medical School, Worcester
> 55 Lake Avenue North
> Worcester, MA 01655 USA
> Message: 6
> Date: Tue, 16 Dec 2003 10:24:31 -0500
> From: "Garrett Frampton" <firstname.lastname@example.org>
> Subject: RE: [BioC] ttest or fold change
> To: <email@example.com>
> Message-ID: <00b801c3c3e8$b3ed2cc0$e1be299b@GARRETT>
> Content-Type: text/plain; charset="US-ASCII"
> Dr. Baker,
> You wrote about "the problem" that the t-test denominator may be
> accidentally "too small". You say that this issue has been solved
> within the T-test. It is my belief that this problem has only been
> partially solved. It is true that this "problem" has been solved
> single hypothesis test within the T-test, but it has not been solved
> microarray data analysis as a whole.
> It is possible to gain power by using local estimates of variance
> upon more than one gene. This sort of approach is extremely useful
> experiments with only a few replicates because it deals with the
> situation where the within group variance for a single gene happens
> be very small. This is the approach implemented in Cyber-T;
. By looking at the dataset
> a whole, rather than 1 gene at a time, it is possible to eliminate
> false-positives that arise as a result of coincidentally low within
> group variance.
> Do you agree?
> Other than this minor point I think you did a wonderful job putting
> statistical concepts that so many struggle with into words.
> Garrett Frampton
> Research Associate
> Boston University School of Medicine - Microarray Resource
> Bioconductor mailing list