VSN 2.2

0

Entering edit mode

Hans-Ulrich Klein ▴ 330

@hans-ulrich-klein-1945

Last seen 14 months ago

United States

Dear all, I want to normalize a matrix of gene expression values G using VSN. With VSN 1.x I did > vsnRes <- vsn(G) > Gvsn_old <- exprs(vsnRes) / log(2) This still works with vsn 2.2 (besides deprecated warnings) and computes exactly the same values as the older version of vsn does. Now, with vsn 2.2 I type > vsnFit <- vsn2(G) > Gvsn_new <- predict(vsnFit, G) In both cases the normalization works. However, I noticed that the values are slightly different. It is not worth worrying about it, but I would appreciate if someone could explain me the reason for that - just to satisfy my curiosity. > Gvsn_new[1:5,1:3] - Gvsn_old[1:5,1:3] 01N 01T 03N [1,] 0.1387763 -0.15328649 -0.011482728 [2,] 0.1398214 -0.09982814 -0.005172216 [3,] 0.1394453 -0.04704412 0.128297523 [4,] 0.1389546 -0.08527670 0.088480265 [5,] 0.1396382 -0.03739234 0.144517027 Regards, Hans-Ulrich

Normalization vsn Normalization vsn • 1.6k views

ADD COMMENT • link updated 18.8 years ago by Wolfgang Huber ★ 13k • written 18.8 years ago by Hans-Ulrich Klein ▴ 330

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

Dear Hans-Ulrich, the function "vsn" has not changed, it is identical to previous releases. If there is demand, I can leave it there for quite a while. "vsn2" optimizes the same likelihood function, but the implementation is different (and I hope better). There will be numerical differences, but they should not be consequential. If someone does discover substantial differences, please tell me, this should in general not happen. However: 1. I did change the way the overall baseline (additive offset) of the result is computed (see "Value" section of vsn2 man page), before, that was based on array number 1, now it is based on a mean of all arrays. 2. The likelihood function can sometimes be quite flat, and the maximum found by numerical optimization can vary. In these cases, different normalization parameter values are almost as good as each other, i.e. the result may numerically differ but not in a consequential manner. 3. In some (ill-determined) cases, the optimizer may run off into outer space and converge in a meaningless local maximum... Some quality control and sanity check on the result (function "meanSdPlot") is always recommended. Hope this helps, please let me know, How do the scatterplots of old versus new glog-ratios look like? Best wishes Wolfdgang Klein wrote: > Dear all, > > I want to normalize a matrix of gene expression values G using VSN. With > VSN 1.x I did > > > vsnRes <- vsn(G) > > Gvsn_old <- exprs(vsnRes) / log(2) > > This still works with vsn 2.2 (besides deprecated warnings) and computes > exactly the same values as the older version of vsn does. Now, with vsn > 2.2 I type > > > vsnFit <- vsn2(G) > > Gvsn_new <- predict(vsnFit, G) > > In both cases the normalization works. However, I noticed that the > values are slightly different. It is not worth worrying about it, but I > would appreciate if someone could explain me the reason for that - just > to satisfy my curiosity. > > > Gvsn_new[1:5,1:3] - Gvsn_old[1:5,1:3] > 01N 01T 03N > [1,] 0.1387763 -0.15328649 -0.011482728 > [2,] 0.1398214 -0.09982814 -0.005172216 > [3,] 0.1394453 -0.04704412 0.128297523 > [4,] 0.1389546 -0.08527670 0.088480265 > [5,] 0.1396382 -0.03739234 0.144517027 > > > Regards, > Hans-Ulrich > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

ADD COMMENT • link 18.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, thank you very much for your fast and detailed response! Wolfgang Huber wrote: > [...] > Hope this helps, please let me know, > How do the scatterplots of old versus new glog-ratios look like? I plotted old versus new glog-intensities instead of ratios (only the green channel has been used on the chips). All points are located close to a straight line with a small positive intercept (probably due to the new additive offset computation) and slope 1. The difference between both implementations is really negligible in this case. Hans-Ulrich

ADD REPLY • link 18.8 years ago Hans-Ulrich Klein ▴ 330

0

Entering edit mode

Hans-Ulrich Klein ▴ 330

@hans-ulrich-klein-1945

Last seen 14 months ago

United States

Hi all, I noticed that the function "vsn2" is much slower than the function "vsn". Probably it is not a general problem, but at least for my dataset the difference in computation time is remarkable: > vsnResG <- vsn(RG$G[subSet,1:5], strata=RG$genes$Block[subSet]); vsn: 25727 x 5 matrix (48 strata). 100% done. Finished after ~30s. > vsnFitG <- vsn2(RG$G[subSet,1:5], strata=RG$genes$Block[subSet]) vsn: 25727 x 5 matrix (48 strata). 100% done. Finished after ~1h. After transformation > Gvsn_new <- predict(vsnFitG, RG$G[subSet,1:5]) > parsG <- preproc(description(vsnResG))$vsnParams > Gvsn_old <- vsnh(RG$G[subSet,1:5] + 0, parsG, strata=RG$genes$Block[subSet]) I checked that the variance is independet of the mean. And plotted the new versus the old glog intensities: > plot(Gvsn_new, Gvsn_old/log(2), pch=".") > abline(0,1, col="red") Here, the plot shows a couple of "stripes" with slope 1 and different intercepts. I uploaded the plot: http://img504.imageshack.us/img504/4700/oldnewglogge0.png I guess that the "stripes" are the 48 printtips (used to stratify the data). Thus, the different additive offsets should not influence further analysis (like limma). Best wishes, Hans-Ulrich

ADD COMMENT • link 18.8 years ago Hans-Ulrich Klein ▴ 330

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

> Hans-Ulrich Klein wrote > Thu May 3 15:50:40 CEST 2007 > I noticed that the function "vsn2" is much slower than the function > "vsn". Probably it is not a general problem, but at least for my dataset > the difference in computation time is remarkable: Dear Hans-Ulrich, please excuse the delayed reply, I was traveling. I would definitely like to follow up on this. There are some internal parameters of the likelihood optimiser that affect computation time, and which I have tightened (although I wouldn't have thought with such a drastic effect). * Have others also experienced drastic increases in compute time? * Internally in the vsn C code, the L-BFGS-B algorithm is given two parameters (amog others) to decide on convergence. In vsn2, they are: factr = 4e4; maxit = 200000; In the old vsn, they were factr = 5e+7; maxit = 40000; factr controls the tolerance for when the optimiser thinks the target function is stationary and it has converged; the current setting is 1250 times more precise than previously. maxit is the maximal number of iterations. After extensive simulations (see the vignette "Verifying and assessing...") I went for the much more tight settings in order to ensure convergence to a unique optimum even in cases there the optimal transformation is close to the normal logarithm (and in that limit the likelihood has one direction in which it is very flat). But there is a trade-off with computation time, and of course the trade-off really depends on the data. Before making any more changes, I wonder whether you or other users have comments / wishes. Otherwise I would -- expose the above parameters to the R-function interface, so people can set them -- make the default a bit more lenient again. More see below. > > vsnResG <- vsn(RG$G[subSet,1:5], strata=RG$genes$Block[subSet]); > vsn: 25727 x 5 matrix (48 strata). 100% done. > > Finished after ~30s. > > > vsnFitG <- vsn2(RG$G[subSet,1:5], strata=RG$genes$Block[subSet]) > vsn: 25727 x 5 matrix (48 strata). 100% done. > > Finished after ~1h. > > > After transformation > > > Gvsn_new <- predict(vsnFitG, RG$G[subSet,1:5]) > > parsG <- preproc(description(vsnResG))$vsnParams > > Gvsn_old <- vsnh(RG$G[subSet,1:5] + 0, parsG, > strata=RG$genes$Block[subSet]) > > I checked that the variance is independet of the mean. And plotted the > new versus the old glog intensities: > > > plot(Gvsn_new, Gvsn_old/log(2), pch=".") > > abline(0,1, col="red") > > Here, the plot shows a couple of "stripes" with slope 1 and different > intercepts. > I uploaded the plot: > http://img504.imageshack.us/img504/4700/oldnewglogge0.png > > I guess that the "stripes" are the 48 printtips (used to stratify the > data). Thus, the different additive offsets should not influence further > analysis (like limma). Yes - but is worrying that the ranges of the different strips (strata) are much more similar (all between 8 and 16) in the "old" version than in the new one. Would you be able send me your RG object and so I can better explore these questions? Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

ADD COMMENT • link 18.8 years ago Wolfgang Huber ★ 13k

Login before adding your answer.