vsn2 and print-tips

0

Entering edit mode

Hans-Ulrich Klein ▴ 330

@hans-ulrich-klein-1945

Last seen 7 months ago

United States

Dear all, I use the vsn2 method to normalize single-colour arrays with 48 print-tips (25*26 oligos per print-tip). After normalization, the intensities of the 48 print-tips are in different ranges. The grid of the print-tips can be seen clearly on false color representations of the arrays' spatial distributions of feature intensities. However, scale and location of the intensities of a print-tip do not change across arrays. The man page of vsn2 says: "The data are returned on a glog scale to base 2. More precisely, the transformed data are subject to the transformation glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) = asinh(u)/log(2) is called the generalised logarithm, a and b are the fitted model parameters (see references), f is a parameter transformation [4], and the overall constant offset c is computed from b such that for large x the transformation approximately corresponds to the log2 function." May be there are not enough "large x" in some print-tips due to missing values in my data. I observed that reducing the number of oligos leads to even larger differences in the print-tip offsets. Are there parameters to take influence on the computation of c? Has someone else observed this problem? The older "vsn" function does not lead to different print-tip offsets. Regards, Hans-Ulrich > sessionInfo() R version 2.6.2 (2008-02-08) x86_64-pc-linux-gnu locale: C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] vsn_3.2.1 limma_2.12.0 affy_1.16.0 [4] preprocessCore_1.0.0 affyio_1.6.1 Biobase_1.16.3 loaded via a namespace (and not attached): [1] grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17

Normalization Normalization • 1.2k views

ADD COMMENT • link updated 16.1 years ago by Wolfgang Huber ★ 13k • written 16.1 years ago by Hans-Ulrich Klein ▴ 330

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 15 days ago

EMBL European Molecular Biology Laborat…

Dear Hans-Ulrich, thank you for your thoughtful message! The executive summary: this is indeed an (unintended) difference between vsn and vsn2, and I will update vsn2 before the next release. It only affects applications with multiple strata (print-tip groups). Bacgkround: the error and normalisation model of vsn is invariant under an overall scaling of the data: if you multiply all intensities by a factor of 10, you will get the same output - except for an overall shift on the glog2 scale of log2(10). This makes sense because microarray data don't have units and a value of "200" can mean very different things say on an Affymetrix genechip and on a custom-made array. This explains why there is this 'arbitrary' offset c. It is computed through an explicite formula from the b's (i.e. the scale factors), hence the fact whether your actual data contain instances of large x does not directly matter (it may indirectly, by affecting how the b's are estimated). For x -> infinity, the function glog2(f(b)*x+a) approaches log2(x) + log2(f(b)) + log2(2), and c is computed to cancel out the last two terms, so that for large x, the net transformation resembles log2(x). There is one b for each array and stratum (=print tip group). The current implementation of vsn2 computes one single value c by taking the mean of log2(f(b)) + log2(2) across all strata and arrays. The old vsn computed c from the b's of the first array only, but separately for each stratum. I had not anticipated that the difference between strata could make such a difference, but given your observations, and with more thought about it, it does make sense. I will update vsn2 to compute c from averaging over the arrays, but separately for each stratum. Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 04/03/2008 16:38 Hans-Ulrich Klein scripsit > Dear all, > > I use the vsn2 method to normalize single-colour arrays with 48 > print-tips (25*26 oligos per print-tip). After normalization, the > intensities of the 48 print-tips are in different ranges. The grid of > the print-tips can be seen clearly on false color representations of the > arrays' spatial distributions of feature intensities. However, scale and > location of the intensities of a print-tip do not change across arrays. > > The man page of vsn2 says: > "The data are returned on a glog scale to base 2. More precisely, > the transformed data are subject to the transformation > glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) = > asinh(u)/log(2) is called the generalised logarithm, a and b are > the fitted model parameters (see references), f is a parameter > transformation [4], and the overall constant offset c is computed > from b such that for large x the transformation approximately > corresponds to the log2 function." > > May be there are not enough "large x" in some print-tips due to missing > values in my data. I observed that reducing the number of oligos leads > to even larger differences in the print-tip offsets. Are there > parameters to take influence on the computation of c? Has someone else > observed this problem? The older "vsn" function does not lead to > different print-tip offsets. > > Regards, > Hans-Ulrich > > > > > > sessionInfo() > R version 2.6.2 (2008-02-08) > x86_64-pc-linux-gnu > > locale: > C > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] vsn_3.2.1 limma_2.12.0 affy_1.16.0 > [4] preprocessCore_1.0.0 affyio_1.6.1 Biobase_1.16.3 > > loaded via a namespace (and not attached): > [1] grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.1 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Hans-Ulrich et al., vsn >= 3.4.11. now computes separate offsets for different strate (such as e.g. print-tip groups), as suggested by Hans-Ulrich earlier in this thread. It is available at http://www.bioconductor.org/packages/2.2/bioc/html/vsn.html Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 04/03/2008 16:52 Wolfgang Huber a ?crit > Dear Hans-Ulrich, > > thank you for your thoughtful message! The executive summary: this is > indeed an (unintended) difference between vsn and vsn2, and I will > update vsn2 before the next release. It only affects applications with > multiple strata (print-tip groups). > > Bacgkround: the error and normalisation model of vsn is invariant under > an overall scaling of the data: if you multiply all intensities by a > factor of 10, you will get the same output - except for an overall shift > on the glog2 scale of log2(10). This makes sense because microarray data > don't have units and a value of "200" can mean very different things say > on an Affymetrix genechip and on a custom-made array. > > This explains why there is this 'arbitrary' offset c. It is computed > through an explicite formula from the b's (i.e. the scale factors), > hence the fact whether your actual data contain instances of large x > does not directly matter (it may indirectly, by affecting how the b's > are estimated). For x -> infinity, the function glog2(f(b)*x+a) > approaches log2(x) + log2(f(b)) + log2(2), and c is computed to cancel > out the last two terms, so that for large x, the net transformation > resembles log2(x). There is one b for each array and stratum (=print tip > group). The current implementation of vsn2 computes one single value c > by taking the mean of log2(f(b)) + log2(2) across all strata and arrays. > The old vsn computed c from the b's of the first array only, but > separately for each stratum. > > I had not anticipated that the difference between strata could make such > a difference, but given your observations, and with more thought about > it, it does make sense. I will update vsn2 to compute c from averaging > over the arrays, but separately for each stratum. > > Best wishes > Wolfgang > > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > > > 04/03/2008 16:38 Hans-Ulrich Klein scripsit >> Dear all, >> >> I use the vsn2 method to normalize single-colour arrays with 48 >> print-tips (25*26 oligos per print-tip). After normalization, the >> intensities of the 48 print-tips are in different ranges. The grid of >> the print-tips can be seen clearly on false color representations of the >> arrays' spatial distributions of feature intensities. However, scale and >> location of the intensities of a print-tip do not change across arrays. >> >> The man page of vsn2 says: >> "The data are returned on a glog scale to base 2. More precisely, >> the transformed data are subject to the transformation >> glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) = >> asinh(u)/log(2) is called the generalised logarithm, a and b are >> the fitted model parameters (see references), f is a parameter >> transformation [4], and the overall constant offset c is computed >> from b such that for large x the transformation approximately >> corresponds to the log2 function." >> >> May be there are not enough "large x" in some print-tips due to missing >> values in my data. I observed that reducing the number of oligos leads >> to even larger differences in the print-tip offsets. Are there >> parameters to take influence on the computation of c? Has someone else >> observed this problem? The older "vsn" function does not lead to >> different print-tip offsets. >> >> Regards, >> Hans-Ulrich >> >> >> >> >> > sessionInfo() >> R version 2.6.2 (2008-02-08) >> x86_64-pc-linux-gnu >> >> locale: >> C >> >> attached base packages: >> [1] tools stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] vsn_3.2.1 limma_2.12.0 affy_1.16.0 >> [4] preprocessCore_1.0.0 affyio_1.6.1 Biobase_1.16.3 >> >> loaded via a namespace (and not attached): >> [1] grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.1 years ago Wolfgang Huber ★ 13k

Login before adding your answer.