Dear all,
I am using the R package VSN for normalising proteomics data (abundances) that then goes in limma for contrast analysis. I know that VSN was not designed to work with proteomics data but actually vsn2 is doing a great job and clearly stabilised the variance, see below

The issue I have, if that the results are slightly different depending on which platform I am running the R code.
Example MacOS
logFC AveExpr t P.Value adj.P.Val B1 1.5389102 12.82511 7.730600 2.590139e-06 0.002297454 4.745133352 1.7838522 17.67135 5.358662 1.149218e-04 0.047924259 1.443241183 1.0633257 13.62943 5.164849 1.620888e-04 0.047924259 1.132310594 1.5435723 12.16677 4.682596 3.896566e-04 0.086406353 0.333537635 -1.0275168 12.40099 -4.508445 5.386750e-04 0.095560937 0.03691621
Normalized abundances
control_24_3 control_24_4 control_24_5 control_24_6 control_24_7[1,] 13.07592 12.64651 12.24371 12.45800 12.00618[2,] 13.44991 14.09624 14.22835 13.85055 14.05933[3,] 16.24021 16.82461 16.19956 16.47579 17.68974[4,] 10.97001 10.35783 11.89993 10.56033 11.50507[5,] 14.25346 14.69884 14.11004 14.68608 13.98822
Debian 7
logFC AveExpr t P.Value adj.P.Val B1 1.5389434 12.82507 7.730738 2.589577e-06 0.002296955 4.745358152 1.7838631 17.67135 5.358692 1.149147e-04 0.047919273 1.443305653 1.0633463 13.62941 5.164902 1.620719e-04 0.047919273 1.132411084 1.5435961 12.16673 4.682718 3.895658e-04 0.086386209 0.333751165 -1.0275537 12.40094 -4.508525 5.385908e-04 0.095546006 0.03705773
the normalised abundances look slightly different
control_24_3 control_24_4 control_24_5 control_24_6 control_24_7 [1,] 13.07588 12.64647 12.24365 12.45795 12.00612 [2,] 13.44988 14.09622 14.22836 13.85052 14.05931 [3,] 16.24018 16.82459 16.19959 16.47577 17.68974 [4,] 10.96996 10.35768 11.89984 10.56020 11.50499 [5,] 14.25343 14.69882 14.11004 14.68606 13.98820
When the normalisation is disabled the results are identical on both GNU/Linux and MacOS. That’s why I am suspecting a rounding issue since the abundances are integers when they go in limma. However, the t-tests, logFC and avgExp in limma are completely wrong as the data have a huge variance.
I asked this question to Wolfgang Huber who advice me to check the point FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html ) and pointe out that VSN is using an iterative algorithm for a nonlinear optimisation.
For info, under MacOS:
sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-apple-darwin14.3.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] qvalue_2.0.0 limma_3.24.10 vsn_3.36.0 Biobase_2.28.0 BiocGenerics_0.14.0 argparse_1.0.1 [7] proto_0.3-10 magrittr_1.5 dplyr_0.4.2 tidyr_0.2.0 reshape2_1.4.1 BiocInstaller_1.18.3 loaded via a namespace (and not attached): [1] Rcpp_0.11.6 plyr_1.8.3 tools_3.2.1 zlibbioc_1.14.0 digest_0.6.8 [6] preprocessCore_1.30.0 gtable_0.1.2 lattice_0.20-31 DBI_0.3.1 findpython_1.0.1 [11] stringr_1.0.0 grid_3.2.1 getopt_1.20.0 R6_2.0.1 ggplot2_1.0.1 [16] scales_0.2.5 MASS_7.3-40 splines_3.2.1 assertthat_0.1 colorspace_1.2-6 [21] stringi_0.5-2 affy_1.46.1 munsell_0.4.2 rjson_0.2.15 affyio_1.36.0
Debian, packages and R are of same versions
R version 3.2.1 (2015-06-18)Platform: x86_64-pc-linux-gnu (64-bit)Running under: Debian GNU/Linux 7 (wheezy)other attached packages:[1] qvalue_2.0.0 limma_3.24.10 vsn_3.36.0[4] Biobase_2.28.0 BiocGenerics_0.14.0 argparse_1.0.1[7] proto_0.3-10 magrittr_1.5 dplyr_0.4.2[10] tidyr_0.2.0 reshape2_1.4.1loaded via a namespace (and not attached):[1] Rcpp_0.11.6 BiocInstaller_1.18.3 plyr_1.8.3[4] tools_3.2.1 zlibbioc_1.14.0 digest_0.6.8[7] preprocessCore_1.30.0 gtable_0.1.2 lattice_0.20-31[10] DBI_0.3.1 findpython_1.0.1 stringr_1.0.0[13] grid_3.2.1 getopt_1.20.0 R6_2.0.1[16] ggplot2_1.0.1 scales_0.2.5 MASS_7.3-41[19] splines_3.2.1 assertthat_0.1 colorspace_1.2-6[22] stringi_0.4-1 affy_1.46.1 munsell_0.4.2[25] rjson_0.2.15 affyio_1.36.0
If you have experienced this, and have some comments / suggestions, I will be happy to update, correct my script.
Many thanks in advance,
Aurelien





Dear Wolfgang,
thanks again for your help. In the tables I first provided, both were processed with vsn2, the numbers are indeed very close, but at the scale of the whole dataset, when we applied a cut-off on the q-values we did not retain the same number of proteins depending on the platform.
When we don't normalized, the limma output is just weird to me. For example
A test case code, together with a dataset sample are available at github here https://github.com/ginolhac/vsn_proteomics for anyone who wants to try it out.
Best wishes,
Aurelien