Question: Wilcoxon test [was loged data or not loged previous to use normalize.quantile]

0

14.3 years ago by

Gordon Smyth ♦

**38k**Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia

Gordon Smyth ♦

**38k**wrote:There are many different permutation tests and a properly designed
permutation test can be very general indeed. In order to be specific,
I'm
refering to the Wilxon two-sample rank test (aka Mann-Whitney test)
which
is equivalent to a particular permutation test.
Over many years as a statistician, I've heard it said so many times
"the
variances were not equal so I used a Wilcoxon two-sample test instead
of a
t-test" or "I used a rank test which is assumption free". Like Naomi,
I
find it frustating that this misunderstanding is so common. The fact
is
that all tests make some assumptions, and inequality of population
variances under the null hypothesis breaks the Wilcoxon test just as
it
does the pooled t-test. I don't know which test breaks down more
quickly --
I certainly haven't seen any evidence that the Wilcoxon test is more
robust
than the t-test to inequality of variances.
It is easy to confirm that the Wilcoxon test breaks down under
inequality
of variances, either by a simulation or just with a back of an
envelope
calculation. Suppose for example that you are testing equality of
means (or
medians) of two populations with sample sizes n1 and n2. Suppose that
the
two populations have equal medians but that population 1 has a very
much
larger variance than population 2. Then the two samples will separate,
with
all of sample 1 larger than sample 2, with probability 1/2^n1.
However, the
one-sided Wilcoxon test p-value in such a case will be
1/choose(n1+n2,n1),
a very much smaller quantity. Suppose for example that n1=5, n2=10.
Then
the p-value will be evaluated by the Wilcoxon test as
> 1/choose(5+10,5)
[1] 0.0003330003
but the actual size of the test is
> 1/2^5
[1] 0.03125
which is 100 times the nominal p-value. This shows that Wilxocon test
does
not hold its size under inequality of variances.
>[BioC] loged data or not loged previous to use normalize.quantile
>Rhonda DeCook rdecook at iastate.edu
>Tue Apr 5 17:51:09 CEST 2005
>
>With respect to permutations tests...
>
>I'm under the impression that you only need independence, not the
>assumption of
>constant variance.
No, independence is not enough, as you say yourself in the next
sentence.
>The permutation test provides us with a distribution of the test
statistic
>under the null hypothesis (equal means in the 2-sample scenario, i.e.
all
>data
>was generated from one distribution-even though it may be an ugly
looking
>single distribution).
You are saying that the observations must be independent and "from one
distribution", i.e., iid, exactly as Naomi said.
The whole point is that, if the population variances are not equal,
then
the two samples cannot be from the same distribution.
> As long as all 'groupings' of the data into 2 groups are
>equally likely (which is provided by the independence assumption)
For all groupings to be equally likely, you need the two populations
to
have the same shape, and this includes equality of variances.
Gordon
> this
>permutation distribution of the test statistic (e.g. a t-statistic
here)gives
>us an idea of the test statistic's distribution under the null
without the
>assumption of normality or constant variance. Computing a
permutation
>p-value
>from this null distribution provides a p-value that has the usual
behavior
>under the null, or Uniform(0,1) though in a discrete manner. When
the
>alternative is true, the distribution of the p-value will have more
mass near
>zero tha the Uniform(0,1).
>
>If this logic doesn't apply to the microarray setting, please let me
know.
>
>Rhonda
>
>
> > I just want to remind people that permutation tests, rank tests,
etc still
> > require i.i.d. errors. So the variance needs to be stabilized
even for
> > nonparametric tests.
> >
> > --Naomi
> >
> > At 01:32 PM 4/4/2005, Fangxin Hong wrote:
> > >Hi Marcelo;
> > >As what Wolfgang mentioned, non-parametric permutation test is an
option
> > >when t-distribution assumption is not valid. But if you have few
> > >replications (2-3), most permutation tests don't have power
either. I
> > >would suggest you try RankProd package, which would be powerful
enough to
> > >detect differentially expressed genes with 2 replications.
> > >
> > >Bests;
> > >Fangxin
> > >
> > >
> > >
> > > > Hi Marcelo,
> > > >
> > > > the difference is that the power of the test you are doing can
be
> > > > different when you consider the data on the "raw" or on the
> > > > log-transformed scale.
> > > >
> > > > Also, the p-value calculated by limma is based on the
assumption that
> > > > the null-distribution of the test statistic is given by a
> > > > t-distribution; this assumption might be more or less true in
both
> cases.
> > > >
> > > > You are really doing two different tests: test A, say,
consists of
> > > > applying the t-statistic to the untransformed intensities,
test B, say,
> > > > applying the t-statistic to the transformed intensities.
> > > >
> > > > Then, if you want to use the t-distribution for getting
p-values, you
> > > > need to make sure that the null distribution of your test
statistic
> > > > is indeed (to good enough approximation) t-distributed. You
can do this
> > > > e.g. by permutations. For that you need either a large number
of
> > > > replicates, or to pool variance estimators across genes.
> > > >
> > > > If you don't want to make a parametric assumption for getting
p-values,
> > > > you need a larger number of replicates; if you have these, you
can for
> > > > example calculate a permutation p-value.
> > > >
> > > > So, there is really no "right" or "wrong" about transforming,
or which
> > > > transformation -- as long as you don't violate the assumptions
of the
> > > > subsequent tests. If the assumptions are met, then the
procedure with
> > > > the highest power is preferable. And that depends very much on
your
> data
> > > > (about which you have not told us much.)
> > > >
> > > > Hope that helps.
> > > >
> > > > And here is another shameless plug: have a look at this paper:
> > > > Differential Expression with the Bioconductor Project
> > > > http://www.bepress.com/bioconductor/paper7
> > > >
> > > > Best wishes
> > > > Wolfgang
> > > >
> > > > Marcelo Luiz de Laia wrote:
> > > >> Dear Bioconductors Friends,
> > > >>
> > > >> I have a question that I dont found answer for it. Please, if
you
> have a
> > > >> paper/article that explain it, please, tell me.
> > > >>
> > > >> I normalize our data using normalize.quantile function.
> > > >>
> > > >> If I previous transform our intensities (single channel) in
log2,
> I dont
> > > >> get differentially genes in limma.
> > > >>
> > > >> But, if I dont transform our data, I get some genes with
p.value
> around
> > > >> 0.0001, thats is great!
> > > >>
> > > >> Of course, when I transform the intensities data to log2, I
get
> some NA.
> > > >>
> > > >> Why are there this difference? Am I wrong in does an analysis
with not
> > > >> loged data?
> > > >>
> > > >> Thanks a lot
> > > >>
> > > >> Marcelo
> > > >>
> > > >> _______________________________________________
> > > >> Bioconductor mailing list
> > > >> Bioconductor at stat.math.ethz.ch
> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > >
> > > > --
> > > > Best regards
> > > > Wolfgang
> > > >
> > > > -------------------------------------
> > > > Wolfgang Huber
> > > > European Bioinformatics Institute
> > > > European Molecular Biology Laboratory
> > > > Cambridge CB10 1SD
> > > > England
> > > > Phone: +44 1223 494642
> > > > Fax: +44 1223 494486
> > > > Http: www.ebi.ac.uk/huber
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > >
> > >
> > >
> > >--
> > >Fangxin Hong, Ph.D.
> > >Plant Biology Laboratory
> > >The Salk Institute
> > >10010 N. Torrey Pines Rd.
> > >La Jolla, CA 92037
> > >E-mail: fhong at salk.edu
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor at stat.math.ethz.ch
> > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> > Naomi S. Altman 814-865-3791
(voice)
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics 814-863-7114
(fax)
> > Penn State University 814-865-1348
(Statistics)
> > University Park, PA 16802-2111

ADD COMMENT
• link
•
modified 14.3 years ago
by
Claus Mayer •

**330**• written 14.3 years ago by Gordon Smyth ♦**38k**