Entering edit mode
Leslie Cope
▴
20
@leslie-cope-683
Last seen 9.6 years ago
The tests already take sample size into account, which is part of the
problem.
If two datasets really come from the exact same distribution, then as
sample size increases, histograms, density plots, summary statistics
and
so on
will get closer and closer to one another. The tests take this into
account.
This becomes a problem in our case because we know that even with the
large number
of genes on a chip, there are differences in distribution from chip to
chip. Some of these
differences don't matter for quantile normalization. For example a
simple
difference in
means would obviously not be a problem for quantile normalization.
Nor
would a simple
difference in variance. These and more complicated differences
between
distribution can
be accounted for when building tests, but the standard tests
themselves
are blind and can't
tell distributional differences we care about from those we don't.
And for that matter, it is evident from recent discussion in this
forum
that no
one is sure which differences we should care about and which don't
matter.
Trying to figure out is the whole point of this thread.
Because of that I suspect that you will not get a nice clean answer to
your first
question at this time.
Leslie Cope, Ph.D.
Oncology Biostatistics, JHU
> 2. As a non-statistician I'm a bit confused that statistical test
will
> nearly
> always find a significant difference between distributions when the
> samples
> are large (I remember someone mentioned this to me - without
explanations
> -
> about 2 years ago in a posting to the R-list). Is there a way to
> "normalize"
> the test results (e.g. the p-values) by the size of the sample?
>
> I guess such a significant difference as reported by a test is a
*real*
> difference (otherwise all statistical test would be worthless ...).
Can
> one
> assume, that even if the two distributions are statistically
different,
> one
> can treat them as equal judged by visuall investigatigation of a
density
> plot
> or histogram?