Question: A couple VSN related questions
0
gravatar for Leo J. LEE
12.0 years ago by
Leo J. LEE20
Leo J. LEE20 wrote:
Dear all, I am new to Bioconductor and R (having the latest version, Bioconductor 2.1 and R 2.6.1, installed) so please bear with me if I ask something naive/stupid. Anyway, the most important reason why I come to Bioconductor is to use the vsn package, and after trying it a bit, I have the following questions: 1. It is clear from the documentation (and a previous email from Wolfgang) that the real computation of vsn2 is written in C. Is it possible for me to access the C source code (and compile and run it myself)? If yes, this will be truly great since I have to do other computational analysis outside of R anyway. 2. I am trying to follow the computation performed by VSN as documented in the "Introduction to VSN" vignette. Loading the kidney dataset, I have the following data before and after VSN: > exprs(kidney)[1:10,] channels spots green red 1 815.32 937.02 2 671.66 765.03 3 713.93 713.59 4 703.97 656.70 5 493.59 472.10 6 477.92 453.13 7 346.55 385.47 8 377.34 421.86 9 132.80 121.77 10 148.65 130.41 > fit = vsn2(kidney) vsn: 8704 x 2 matrix (1 stratum). 100% done. > fit at hx[1:10,] channels spots green red 1 9.711781 9.870624 2 9.453765 9.597638 3 9.533994 9.506128 4 9.515436 9.398666 5 9.067212 8.995503 6 9.028838 8.948561 7 8.673814 8.771463 8 8.762664 8.868634 9 7.957576 7.926483 10 8.016070 7.957696 Now I am trying to reproduce the result from parameters estimated by VSN using > coef(fit)[1,,] [,1] [,2] [1,] -0.08444849 -5.927967 [2,] -0.06787093 -5.957545 > offset <- coef(fit)[1,,1]; > scale <- exp(coef(fit)[1,,2]); > ynorm_my = asinh(exprs(kidney)*scale + offset); > ynorm_my[1:10,] channels spots green red 1 1.4820851 1.6139176 2 1.2851048 1.4029668 3 1.3588520 1.3584152 4 1.3272734 1.2650499 5 1.0353039 0.9986861 6 0.9954236 0.9530607 7 0.7626212 0.8400535 8 0.8148209 0.8976601 9 0.2661626 0.2376892 10 0.3115130 0.2662458 which is clearly different from the result produced by VSN. Could anybody tell me what I have done wrong? Thank you so much! -- Leo http://www.psi.toronto.edu/~ljlee/
kidney vsn • 430 views
ADD COMMENTlink modified 12.0 years ago by Wolfgang Huber13k • written 12.0 years ago by Leo J. LEE20
Answer: A couple VSN related questions
0
gravatar for Wolfgang Huber
12.0 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Dear Leo, answers please see below. > I am new to Bioconductor and R (having the latest version, Bioconductor 2.1 > and R 2.6.1, installed) so please bear with me if I ask something > naive/stupid. Anyway, the most important reason why I come to Bioconductor > is to use the vsn package, and after trying it a bit, I have the following > questions: > > 1. It is clear from the documentation (and a previous email from Wolfgang) > that the real computation of vsn2 is written in C. Is it possible for me to > access the C source code (and compile and run it myself)? If yes, this will > be truly great since I have to do other computational analysis outside of R > anyway. Yes, it is all open source. The most obvious way is to download the .tar.gz file from http://www.bioconductor.org/packages/devel/bioc/html/vsn.html and unpack it. See the file vsn2.c. There is also public, anonymous svn access, which is described on the bioc website. > 2. I am trying to follow the computation performed by VSN as documented in > the "Introduction to VSN" vignette. Loading the kidney dataset, I have the > following data before and after VSN: >> exprs(kidney)[1:10,] > channels > spots green red > 1 815.32 937.02 > 2 671.66 765.03 > 3 713.93 713.59 > 4 703.97 656.70 > 5 493.59 472.10 > 6 477.92 453.13 > 7 346.55 385.47 > 8 377.34 421.86 > 9 132.80 121.77 > 10 148.65 130.41 >> fit = vsn2(kidney) > vsn: 8704 x 2 matrix (1 stratum). 100% done. >> fit at hx[1:10,] > channels > spots green red > 1 9.711781 9.870624 > 2 9.453765 9.597638 > 3 9.533994 9.506128 > 4 9.515436 9.398666 > 5 9.067212 8.995503 > 6 9.028838 8.948561 > 7 8.673814 8.771463 > 8 8.762664 8.868634 > 9 7.957576 7.926483 > 10 8.016070 7.957696 > > Now I am trying to reproduce the result from parameters estimated by VSN > using >> coef(fit)[1,,] > [,1] [,2] > [1,] -0.08444849 -5.927967 > [2,] -0.06787093 -5.957545 >> offset <- coef(fit)[1,,1]; >> scale <- exp(coef(fit)[1,,2]); >> ynorm_my = asinh(exprs(kidney)*scale + offset); >> ynorm_my[1:10,] > channels > spots green red > 1 1.4820851 1.6139176 > 2 1.2851048 1.4029668 > 3 1.3588520 1.3584152 > 4 1.3272734 1.2650499 > 5 1.0353039 0.9986861 > 6 0.9954236 0.9530607 > 7 0.7626212 0.8400535 > 8 0.8148209 0.8976601 > 9 0.2661626 0.2376892 > 10 0.3115130 0.2662458 > > which is clearly different from the result produced by VSN. Could anybody > tell me what I have done wrong? Thank you so much! Try this: library("vsn") data("kidney") fit = vsn2(kidney) shift = coef(fit)[1,,1] scale = exp(coef(fit)[1,,2]) nk = t(asinh(t(exprs(kidney))*scale + shift)) nk = nk/log(2) - fit at hoffset identical(nk , exprs(fit)) [1] TRUE This is a subtle point. It is mentioned (admittedly too briefly) in the vsn2 man page, section "Note on overall scale and location of the glog transformation": The data are returned on a glog scale to base 2. More precisely, the transformed data are subject to the transformation glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) = asinh(u)/log(2) is called the generalised logarithm, a and b are the fitted model parameters (see references), f is a parameter transformation [4], and the overall constant offset c is computed from b such that for large x the transformation approximately corresponds to the log2 function. The offset c is inconsequential for all differential expression calculations, but many users like to see the data in a range that they are familiar with. So the difference to what you did is (i) to go from glog to glog_2 scale (the division by log(2)), and (ii) to subtract fit at hoffset. The computation of fit at hoffset is done towards the end of the "vsnMatrix" function. Best wishes Wolfgang R version 2.7.0 Under development (unstable) (2008-01-29 r44233) x86_64-unknown-linux-gnu locale: C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] vsn_3.4.5 limma_2.13.3 affy_1.17.3 [4] preprocessCore_1.1.5 affyio_1.7.9 Biobase_1.17.8 [7] fortunes_1.3-4 loaded via a namespace (and not attached): [1] grid_2.7.0 lattice_0.17-4
ADD COMMENTlink written 12.0 years ago by Wolfgang Huber13k
Dear Wolfgang, Thank you very much for your thorough and clear answer! I have an additional question: is the "sigsq" slot of a vsn object the estimated variance (which the name and actual value seem to indicate) or the standard deviation (as documented on the man page)? Also is it before or after the log(2) transformation (I guess it is before, but want to make sure)? Thanks again for taking your time to answer my questions! Cheers, -- Leo -----Original Message----- From: Wolfgang Huber [mailto:huber@ebi.ac.uk] Sent: Wednesday, January 30, 2008 7:55 PM To: Leo J. LEE Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] A couple VSN related questions Dear Leo, answers please see below. > I am new to Bioconductor and R (having the latest version, Bioconductor 2.1 > and R 2.6.1, installed) so please bear with me if I ask something > naive/stupid. Anyway, the most important reason why I come to Bioconductor > is to use the vsn package, and after trying it a bit, I have the following > questions: > > 1. It is clear from the documentation (and a previous email from Wolfgang) > that the real computation of vsn2 is written in C. Is it possible for me to > access the C source code (and compile and run it myself)? If yes, this will > be truly great since I have to do other computational analysis outside of R > anyway. Yes, it is all open source. The most obvious way is to download the .tar.gz file from http://www.bioconductor.org/packages/devel/bioc/html/vsn.html and unpack it. See the file vsn2.c. There is also public, anonymous svn access, which is described on the bioc website. > 2. I am trying to follow the computation performed by VSN as documented in > the "Introduction to VSN" vignette. Loading the kidney dataset, I have the > following data before and after VSN: >> exprs(kidney)[1:10,] > channels > spots green red > 1 815.32 937.02 > 2 671.66 765.03 > 3 713.93 713.59 > 4 703.97 656.70 > 5 493.59 472.10 > 6 477.92 453.13 > 7 346.55 385.47 > 8 377.34 421.86 > 9 132.80 121.77 > 10 148.65 130.41 >> fit = vsn2(kidney) > vsn: 8704 x 2 matrix (1 stratum). 100% done. >> fit at hx[1:10,] > channels > spots green red > 1 9.711781 9.870624 > 2 9.453765 9.597638 > 3 9.533994 9.506128 > 4 9.515436 9.398666 > 5 9.067212 8.995503 > 6 9.028838 8.948561 > 7 8.673814 8.771463 > 8 8.762664 8.868634 > 9 7.957576 7.926483 > 10 8.016070 7.957696 > > Now I am trying to reproduce the result from parameters estimated by VSN > using >> coef(fit)[1,,] > [,1] [,2] > [1,] -0.08444849 -5.927967 > [2,] -0.06787093 -5.957545 >> offset <- coef(fit)[1,,1]; >> scale <- exp(coef(fit)[1,,2]); >> ynorm_my = asinh(exprs(kidney)*scale + offset); >> ynorm_my[1:10,] > channels > spots green red > 1 1.4820851 1.6139176 > 2 1.2851048 1.4029668 > 3 1.3588520 1.3584152 > 4 1.3272734 1.2650499 > 5 1.0353039 0.9986861 > 6 0.9954236 0.9530607 > 7 0.7626212 0.8400535 > 8 0.8148209 0.8976601 > 9 0.2661626 0.2376892 > 10 0.3115130 0.2662458 > > which is clearly different from the result produced by VSN. Could anybody > tell me what I have done wrong? Thank you so much! Try this: library("vsn") data("kidney") fit = vsn2(kidney) shift = coef(fit)[1,,1] scale = exp(coef(fit)[1,,2]) nk = t(asinh(t(exprs(kidney))*scale + shift)) nk = nk/log(2) - fit at hoffset identical(nk , exprs(fit)) [1] TRUE This is a subtle point. It is mentioned (admittedly too briefly) in the vsn2 man page, section "Note on overall scale and location of the glog transformation": The data are returned on a glog scale to base 2. More precisely, the transformed data are subject to the transformation glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) = asinh(u)/log(2) is called the generalised logarithm, a and b are the fitted model parameters (see references), f is a parameter transformation [4], and the overall constant offset c is computed from b such that for large x the transformation approximately corresponds to the log2 function. The offset c is inconsequential for all differential expression calculations, but many users like to see the data in a range that they are familiar with. So the difference to what you did is (i) to go from glog to glog_2 scale (the division by log(2)), and (ii) to subtract fit at hoffset. The computation of fit at hoffset is done towards the end of the "vsnMatrix" function. Best wishes Wolfgang
ADD REPLYlink written 12.0 years ago by Leo J. LEE20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 505 users visited in the last hour