Question

vsn and scaling

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 10 days ago

EMBL European Molecular Biology Laborat…

Dear Roslin, I have now had a look at your data (which you sent me off-list). The quick bottomline for hurried listreaders is: consider the parameter "lts.quantile" and try different choices from the default, especially if you are looking at quite variable data! Now in detail. There are two issues: 1. your two datasets not only differ in the order of the arrays (columns), but also in the set of features (rows of the data matrix): the first one has 3884, the second has 3746. Out of these, 3588 are in common, 296 are only in the first and 158 are only in the second. 2. the data appear relatively noisy, and hence the precision of the estimated normalization parameters is not great. For the purpose of comparison, (1) can be addressed by fitting vsn only to the common features (you can then apply the fit to all using the predict method). (2) can often be addressed by varying the parameter "lts.quantile". Its defafult value 0.9 in vsn2 is probably too optimistic for your data (it assumes less variability between arrays), and in the example code below I set it to 0.5. With that, the results for R10001 and R10002 are very similar (see commonfeatures.png). The output of vsn2 can depend sensitively on the input (e.g. the order of the arrays or the addition/omission of a few rows) BUT if that happens, it indicates a problem, either because the vsn model is not appropriate for the data or the numerical optimisation algorithm failed to find the optimal fit. In such cases, tuning of the parameters (in particular, lts.quantile and pstart) can help a lot. Then, in some cases (and I think this applies to yours), the results can seemingly differ, but they could both be within the range of the plausible. Think of them as two different estimates for the same underlying parameter, which are both consistent with the available data and the model. This can happen if the likelihood landscape is very shallow / the model is overparameterized. In particular, this will happen for vsn2 when the relative size of additive noise is small, "glog" is overkill, and "log" would do just as well. The resulting plots ans script are also here: http://www.ebi.ac.uk/~huber/pub/roslin I hope this helps, please let me know. Best wishes Wolfgang --------------------------------------------------------------- library("vsn") x1 = as.matrix(read.table("R10001.vsn.filtered.raw.txt", sep="\t", quote="", header=TRUE, row.names=1L)) x2 = as.matrix(read.table("R10002.vsn.filtered.raw.txt", sep="\t", quote="", header=TRUE, row.names=1L)) common = intersect(rownames(x1), rownames(x2)) onlyx1 = setdiff(rownames(x1), rownames(x2)) onlyx2 = setdiff(rownames(x2), rownames(x1)) cat(length(common), length(onlyx1), length(onlyx2), "\n") ## 3588 296 158 myPlot = function(a, b, name) { png(paste(name, "png", sep="."), width=700, height=500) par(mfrow=c(2,3)) for(cn in colnames(a)){ plot(a[common, cn], b[common, cn], pch=".", main=cn) abline(a=0, b=1, col="blue") } dev.off() } myPlot(log2(x1), log2(x2), "beforevsn") ## with all features v1 = vsn2(x1, lts.quantile=0.5) v2 = vsn2(x2, lts.quantile=0.5) myPlot(exprs(v1), exprs(v2), "withallfeatures") ## with common features only w1 = vsn2(x1[common,], lts.quantile=0.5) w2 = vsn2(x2[common,], lts.quantile=0.5) myPlot(exprs(w1), exprs(w2), "commonfeatures") ------------------------------------------------------------------- > sessionInfo() R version 2.6.0 Patched (2007-10-03 r43075) i686-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] vsn_3.3.1 limma_2.11.14 affy_1.15.12 [4] preprocessCore_0.99.22 affyio_1.5.11 Biobase_1.15.36 [7] fortunes_1.3-3 loaded via a namespace (and not attached): [1] grid_2.6.0 lattice_0.16-5 rcompgen_0.1-15 ------------------------------------------------------------------- Wolfgang Huber ha scritto: > Dear Roslin, > >> Am I correct in saying that the default setting of vsn2 is to scale >> everything to the first array? > > This is not the case. The model that vsn fits is invariant under > permutation of the arrays (you can look at the vignette "Likelihood > Calculations for vsn" if you are interested in details). > > However, the fitting does includes an iterative numeric optimisation; if > that fails, and runs into a local optimum, such a behaviour can depend > sensitively on the data (not just the order of the arrays). > >> Could this be potentially problematic? For example, we have observed >> strange results when we change the order of loading of the data into >> vsn2. The resulting normalised data looks different when vieved in an >> MA plot. > > Can you send me an example that I can reproduce? Such examples would be > helpful for me to improve the diagnostics that can be done by vsn2, > either automatically, or in a subsequent quality control step. > > Best wishes > Wolfgang > > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber >

Normalization vsn Normalization vsn • 660 views

ADD COMMENT • link 16.6 years ago Wolfgang Huber ★ 13k