The problem with p-values is that they measure the "surprise factor"
the size of the effect. Suppose that you are testing a cholesterol
drug, and it really has the effect of lowering mean choldesterol (over
population) by .001. Does anyone care? (Cholesterol values generally
from about 100-400.) But if your sample size is big enough, you have
to detect infinitismally small differences.
For the purpose of normalization, we probably want the probe
to be similar. If they are already identical, we do not need to
normalize. So, with a sufficiently large sample, all we will learn is
the probe distributions are not identical - but not how far apart they
At 10:58 AM 3/16/2004, Arne.Muller@aventis.com wrote:
>I've two questions regarding the suggestions from Naomi.
>1. I've had a look at some density plots (*after* rma bgcorret +
>normalisation across all chips of my experiment). The tails of the
>very similar wheras the at high density some plots differ in shape or
>When/how would you consider the two distributions to be equal?
>2. As a non-statistician I'm a bit confused that statistical test
>always find a significant difference between distributions when the
>are large (I remember someone mentioned this to me - without
>about 2 years ago in a posting to the R-list). Is there a way to
>the test results (e.g. the p-values) by the size of the sample?
>I guess such a significant difference as reported by a test is a
>difference (otherwise all statistical test would be worthless ...).
>assume, that even if the two distributions are statistically
>can treat them as equal judged by visuall investigatigation of a
>What is a large sample? If a test finds a difference between two
>distributions, how do I know it's not just because of the sample
>there something like a "maximum sample size test" (similar to
>power of a test)?
>Thanks again for your comments,
> +kind regarrds,
>Arne Muller, Ph.D.
>Toxicogenomics, Aventis Pharma
>arne dot muller domain=aventis com
> > -----Original Message-----
> > From: email@example.com
> > [mailto:firstname.lastname@example.org]On Behalf Of
> > Naomi Altman
> > Sent: 15 March 2004 16:05
> > To: Stan Smiley; Bioconductor Mailing list
> > Subject: Re: [BioC] Quantile normalization vs. data distributions
> > This is a very good question that I have also been puzzling
> > over. It seems
> > useless to try
> > tests of equality of the distribution such as
> > Kolmogorov-Smirnov- due to
> > the huge sample size you
> > would almost certainly get a significant result.
> > Currently, I am using the following graphical method:
> > 1. I compute a kernel density estimate of the combined data
> > of all probes
> > on all the arrays.
> > 2. I compute a kernel density estimate of the data for each array.
> > 3. I plot both smooths on the same plot, and decide if they
> > are the same.
> > Looking at what I wrote above, I think it would be better in
> > steps 1 and 2
> > to background correct and
> > center each array before combining. It might also be between
> > to reduce the
> > data to standardized scores before combining, unless
> > you think that the overall scaling is due to your "treatment
> > It seems like half of what I do is ad hoc, so I always welcome any
> > criticisms or suggestions.
> > --Naomi Altman
> > At 06:07 PM 3/11/2004, Stan Smiley wrote:
> > >Greetings,
> > >
> > >I have been trying to find a quantitative measure to tell
> > when the data
> > >distributions
> > >between chips are 'seriously' different enough from each
> > other to violate
> > >the
> > >assumptions behind quantile normalization. I've been through
> > the archives
> > >and seen some discussion of this matter, but didn't come away
> > >quantitative measure I
> > >could apply to my data sets to assure me that it would be OK
> > to use quantile
> > >normalization.
> > >
> > >
> > >"Quantile normalization uses a single standard for all
> > chips, however it
> > >assumes that no serious change in distribution occurs"
> > >
> > >Could someone please point me in the right direction on this?
> > >
> > >Thanks.
> > >
> > >Stan Smiley
> > >email@example.com
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor@stat.math.ethz.ch
> > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > Naomi S. Altman 814-865-3791
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics 814-863-7114
> > Penn State University 814-865-1348
> > (Statistics)
> > University Park, PA 16802-2111
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
University Park, PA 16802-2111