Question

Increase in CV of replicated spots after normalization?

0

Entering edit mode

Jakob Hedegaard ▴ 170

@jakob-hedegaard-823

Last seen 9.7 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20050617/ 43c25640/attachment.pl

• 699 views

ADD COMMENT • link updated 18.9 years ago by Wolfgang Huber ★ 13k • written 18.9 years ago by Jakob Hedegaard ▴ 170

score 0 · Answer 1 · 2005-06-18

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 25 days ago

EMBL European Molecular Biology Laborat…

Hi Jakob, it can be misleading to look solely at the CV of replicates to assess normalization. Because if you did that, a normalization method that simply divided all your log-ratios by 2 would be twice as good, and one that sets everything to zero would be even better. What I usually do is look at the distribution of F- or t-statistics per gene across arrays for some meaningful biological grouping of the samples. There need to be enough replicate arrays within each group for this. Still, if you used a "reasonable" normalization method, it sounds it didn't work well on your data. It is hard to say more without more details on what you did and diagnostic plots etc. Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber ------------------------------------- Jakob Hedegaard wrote: > Hi list > > > > I am working on a data set from 24 arrays, where each array consist of > 6.912 spots replicated pair wise at two different spatial locations. > > For quality evaluation, I have calculated the CV of "raw" log-ratios for > each pair wise replicated spot (13.824 points per array) and have > observed the expected tendency of decreasing CV by increasing average > spot intensity. > > When calculating the CV for normalized data, I have observed that the CV > has increased compared to CV for raw data. This essentially means that > normalization is making data worse in terms of variance among replicated > spots! > > > > Has anybody observed something similar? > > Is this what should be expected or does it indicate that the > normalization is not optimally performed? > > > > Looking forward hearing from you! > > Jakob > > ------------------------------------------------------------------- > > Jakob Hedegaard > > Danish Institute of Agricultural Sciences > > Department of Genetics and Biotechnology > > Molecular Genetics and System Biology > > Building K25 > > Research Centre Foulum > > P.O. box 50 > > DK-8830 Tjele > > Denmark > > Tel: (+45)89991363 > > Fax: (+45)89991300 > > >

ADD COMMENT • link 18.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi Wolfgang and Jakob I think there is some confusion here. The CV is (at least as far as I know) standard deviation divided by mean, so it is scale-invariant, i.e dividing all log-ratios by 2 shouldn't make a difference. It is not location-invariant though, which could be the explanation for the increased CV. The normalisation centers the log-ratio distribution, so for most genes the mean should be closer to 0 than before, which will result in an increased CV. For that reason the CV is not an appropriate tool here to assess the effect of the normalisation. As Wolfgang points out, the distribution of F- or t-statistics (or the corresponding p-values) should be a reasonable (and scale-invariant!) exploratory tool to assess the sucess of the normalisation. Best Wishes Claus Wolfgang Huber wrote: >Hi Jakob, > >it can be misleading to look solely at the CV of replicates to assess >normalization. Because if you did that, a normalization method that >simply divided all your log-ratios by 2 would be twice as good, and one >that sets everything to zero would be even better. > >What I usually do is look at the distribution of F- or t-statistics per >gene across arrays for some meaningful biological grouping of the >samples. There need to be enough replicate arrays within each group for >this. > >Still, if you used a "reasonable" normalization method, it sounds it >didn't work well on your data. It is hard to say more without more >details on what you did and diagnostic plots etc. > >Best regards > Wolfgang > > > > > >Jakob Hedegaard wrote: > > >>Hi list >> >> >> >>I am working on a data set from 24 arrays, where each array consist of >>6.912 spots replicated pair wise at two different spatial locations. >> >>For quality evaluation, I have calculated the CV of "raw" log-ratios for >>each pair wise replicated spot (13.824 points per array) and have >>observed the expected tendency of decreasing CV by increasing average >>spot intensity. >> >>When calculating the CV for normalized data, I have observed that the CV >>has increased compared to CV for raw data. This essentially means that >>normalization is making data worse in terms of variance among replicated >>spots! >> >> >> >>Has anybody observed something similar? >> >>Is this what should be expected or does it indicate that the >>normalization is not optimally performed? >> >> >> >>Looking forward hearing from you! >> >>Jakob >> >> >> -- ********************************************************************** ************* Claus-D. Mayer | http://www.bioss.ac.uk Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk Rowett Research Institute | Telephone: +44 (0) 1224 716652 Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349

ADD REPLY • link 18.9 years ago Claus Mayer ▴ 340

0

Entering edit mode

Hi Claus, thanks for pointing this out. This has slipped through since the standard deviation of log(x) is approximately equal to the CV of x, if the latter is not too large (this is seen from a first order expansion), so when I talked about "CV of replicates" I meant the standard deviation of their log-ratios. However, in his mail Jakob refered to "CV of log-ratios", and you are absolutely right - these are not appropriate. Best wishes Wolfgang Claus Mayer wrote: > Hi Wolfgang and Jakob > > I think there is some confusion here. The CV is (at least as far as I > know) standard deviation divided by mean, so it is scale-invariant, i.e > dividing all log-ratios by 2 shouldn't make a difference. It is not > location-invariant though, which could be the explanation for the > increased CV. The normalisation centers the log-ratio distribution, so > for most genes the mean should be closer to 0 than before, which will > result in an increased CV. > For that reason the CV is not an appropriate tool here to assess the > effect of the normalisation. As Wolfgang points out, the distribution > of F- or t-statistics (or the corresponding p-values) should be a > reasonable (and scale-invariant!) exploratory tool to assess the sucess > of the normalisation. > > Best Wishes > > Claus > > > Wolfgang Huber wrote: > >> Hi Jakob, >> >> it can be misleading to look solely at the CV of replicates to assess >> normalization. Because if you did that, a normalization method that >> simply divided all your log-ratios by 2 would be twice as good, and >> one that sets everything to zero would be even better. >> >> What I usually do is look at the distribution of F- or t-statistics >> per gene across arrays for some meaningful biological grouping of the >> samples. There need to be enough replicate arrays within each group >> for this. >> >> Still, if you used a "reasonable" normalization method, it sounds it >> didn't work well on your data. It is hard to say more without more >> details on what you did and diagnostic plots etc. >> >> Best regards >> Wolfgang >> >> >> >> >> >> Jakob Hedegaard wrote: >> >> >>> Hi list >>> >>> >>> >>> I am working on a data set from 24 arrays, where each array consist of >>> 6.912 spots replicated pair wise at two different spatial locations. >>> >>> For quality evaluation, I have calculated the CV of "raw" log- ratios for >>> each pair wise replicated spot (13.824 points per array) and have >>> observed the expected tendency of decreasing CV by increasing average >>> spot intensity. >>> >>> When calculating the CV for normalized data, I have observed that the CV >>> has increased compared to CV for raw data. This essentially means that >>> normalization is making data worse in terms of variance among replicated >>> spots! >>> >>> >>> >>> Has anybody observed something similar? >>> >>> Is this what should be expected or does it indicate that the >>> normalization is not optimally performed? >>> >>> >>> >>> Looking forward hearing from you! >>> >>> Jakob >>> >>> > > > -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD REPLY • link 18.9 years ago Wolfgang Huber ★ 13k