Hi Jakob,
it can be misleading to look solely at the CV of replicates to assess
normalization. Because if you did that, a normalization method that
simply divided all your log-ratios by 2 would be twice as good, and
one
that sets everything to zero would be even better.
What I usually do is look at the distribution of F- or t-statistics
per
gene across arrays for some meaningful biological grouping of the
samples. There need to be enough replicate arrays within each group
for
this.
Still, if you used a "reasonable" normalization method, it sounds it
didn't work well on your data. It is hard to say more without more
details on what you did and diagnostic plots etc.
Best regards
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax: +44 1223 494486
Http: www.ebi.ac.uk/huber
-------------------------------------
Jakob Hedegaard wrote:
> Hi list
>
>
>
> I am working on a data set from 24 arrays, where each array consist
of
> 6.912 spots replicated pair wise at two different spatial locations.
>
> For quality evaluation, I have calculated the CV of "raw" log-ratios
for
> each pair wise replicated spot (13.824 points per array) and have
> observed the expected tendency of decreasing CV by increasing
average
> spot intensity.
>
> When calculating the CV for normalized data, I have observed that
the CV
> has increased compared to CV for raw data. This essentially means
that
> normalization is making data worse in terms of variance among
replicated
> spots!
>
>
>
> Has anybody observed something similar?
>
> Is this what should be expected or does it indicate that the
> normalization is not optimally performed?
>
>
>
> Looking forward hearing from you!
>
> Jakob
>
> -------------------------------------------------------------------
>
> Jakob Hedegaard
>
> Danish Institute of Agricultural Sciences
>
> Department of Genetics and Biotechnology
>
> Molecular Genetics and System Biology
>
> Building K25
>
> Research Centre Foulum
>
> P.O. box 50
>
> DK-8830 Tjele
>
> Denmark
>
> Tel: (+45)89991363
>
> Fax: (+45)89991300
>
>
>
Hi Wolfgang and Jakob
I think there is some confusion here. The CV is (at least as far as I
know) standard deviation divided by mean, so it is scale-invariant,
i.e
dividing all log-ratios by 2 shouldn't make a difference. It is not
location-invariant though, which could be the explanation for the
increased CV. The normalisation centers the log-ratio distribution, so
for most genes the mean should be closer to 0 than before, which will
result in an increased CV.
For that reason the CV is not an appropriate tool here to assess the
effect of the normalisation. As Wolfgang points out, the distribution
of F- or t-statistics (or the corresponding p-values) should be a
reasonable (and scale-invariant!) exploratory tool to assess the
sucess
of the normalisation.
Best Wishes
Claus
Wolfgang Huber wrote:
>Hi Jakob,
>
>it can be misleading to look solely at the CV of replicates to assess
>normalization. Because if you did that, a normalization method that
>simply divided all your log-ratios by 2 would be twice as good, and
one
>that sets everything to zero would be even better.
>
>What I usually do is look at the distribution of F- or t-statistics
per
>gene across arrays for some meaningful biological grouping of the
>samples. There need to be enough replicate arrays within each group
for
>this.
>
>Still, if you used a "reasonable" normalization method, it sounds it
>didn't work well on your data. It is hard to say more without more
>details on what you did and diagnostic plots etc.
>
>Best regards
> Wolfgang
>
>
>
>
>
>Jakob Hedegaard wrote:
>
>
>>Hi list
>>
>>
>>
>>I am working on a data set from 24 arrays, where each array consist
of
>>6.912 spots replicated pair wise at two different spatial locations.
>>
>>For quality evaluation, I have calculated the CV of "raw" log-ratios
for
>>each pair wise replicated spot (13.824 points per array) and have
>>observed the expected tendency of decreasing CV by increasing
average
>>spot intensity.
>>
>>When calculating the CV for normalized data, I have observed that
the CV
>>has increased compared to CV for raw data. This essentially means
that
>>normalization is making data worse in terms of variance among
replicated
>>spots!
>>
>>
>>
>>Has anybody observed something similar?
>>
>>Is this what should be expected or does it indicate that the
>>normalization is not optimally performed?
>>
>>
>>
>>Looking forward hearing from you!
>>
>>Jakob
>>
>>
>>
--
**********************************************************************
*************
Claus-D. Mayer | http://www.bioss.ac.uk
Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
Rowett Research Institute | Telephone: +44 (0) 1224 716652
Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349
Hi Claus,
thanks for pointing this out. This has slipped through since the
standard deviation of log(x) is approximately equal to the CV of x, if
the latter is not too large (this is seen from a first order
expansion),
so when I talked about "CV of replicates" I meant the standard
deviation
of their log-ratios.
However, in his mail Jakob refered to "CV of log-ratios", and you are
absolutely right - these are not appropriate.
Best wishes
Wolfgang
Claus Mayer wrote:
> Hi Wolfgang and Jakob
>
> I think there is some confusion here. The CV is (at least as far as
I
> know) standard deviation divided by mean, so it is scale-invariant,
i.e
> dividing all log-ratios by 2 shouldn't make a difference. It is not
> location-invariant though, which could be the explanation for the
> increased CV. The normalisation centers the log-ratio distribution,
so
> for most genes the mean should be closer to 0 than before, which
will
> result in an increased CV.
> For that reason the CV is not an appropriate tool here to assess the
> effect of the normalisation. As Wolfgang points out, the
distribution
> of F- or t-statistics (or the corresponding p-values) should be a
> reasonable (and scale-invariant!) exploratory tool to assess the
sucess
> of the normalisation.
>
> Best Wishes
>
> Claus
>
>
> Wolfgang Huber wrote:
>
>> Hi Jakob,
>>
>> it can be misleading to look solely at the CV of replicates to
assess
>> normalization. Because if you did that, a normalization method that
>> simply divided all your log-ratios by 2 would be twice as good, and
>> one that sets everything to zero would be even better.
>>
>> What I usually do is look at the distribution of F- or t-statistics
>> per gene across arrays for some meaningful biological grouping of
the
>> samples. There need to be enough replicate arrays within each group
>> for this.
>>
>> Still, if you used a "reasonable" normalization method, it sounds
it
>> didn't work well on your data. It is hard to say more without more
>> details on what you did and diagnostic plots etc.
>>
>> Best regards
>> Wolfgang
>>
>>
>>
>>
>>
>> Jakob Hedegaard wrote:
>>
>>
>>> Hi list
>>>
>>>
>>>
>>> I am working on a data set from 24 arrays, where each array
consist of
>>> 6.912 spots replicated pair wise at two different spatial
locations.
>>>
>>> For quality evaluation, I have calculated the CV of "raw" log-
ratios for
>>> each pair wise replicated spot (13.824 points per array) and have
>>> observed the expected tendency of decreasing CV by increasing
average
>>> spot intensity.
>>>
>>> When calculating the CV for normalized data, I have observed that
the CV
>>> has increased compared to CV for raw data. This essentially means
that
>>> normalization is making data worse in terms of variance among
replicated
>>> spots!
>>>
>>>
>>>
>>> Has anybody observed something similar?
>>>
>>> Is this what should be expected or does it indicate that the
>>> normalization is not optimally performed?
>>>
>>>
>>>
>>> Looking forward hearing from you!
>>>
>>> Jakob
>>>
>>>
>
>
>
--
Best regards
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax: +44 1223 494486
Http: www.ebi.ac.uk/huber