Question

predict vsn with reference

0

Entering edit mode

chrisk ▴ 10

@chrisk-2408

Last seen 9.6 years ago

I'm having difficulty using the 'reference' argument of vsn to put data from a new microarray onto the scale of an existing set of arrays, when all the arrays are normalised using a shared set of controls. I think it's not understanding the way offsets are handled- predicted values for the data used to create a vsn object are different from the values stored in that vsn object when a reference is used. e.g. if I have data from 2 arrays in 'a' and want to put array b back onto their scale, this is what I'm doing: library(vsn) set.seed(214) vals<-runif(1000) a<-matrix(rep(vals,2)+0.1*rnorm(2000),1000,2) b<-vals+0.1*rnorm(1000) aVsn<-vsn2(a) bVsn<-vsn2(b,reference=aVsn) the values stored in bVsn are now on the same scale as the 'a' arrays: plot(exprs(aVsn)[,2],exprs(bVsn)); abline(0,1) however, the predictions from bVsn, using the data b are offset from these values: plot(exprs(bVsn),predict(bVsn,b)); abline(0,1) This is an issue when these comparable spots are only a reference set of probes for a larger array: aFull<-rbind(a,matrix(runif(20000),10000,2)) bFull<-c(b,runif(10000)) I've been calculating values for the 'a' arrays using: aFullVal<-predict(aVsn,aFull) but if I use the same approach for the b array I cease to be on the same scale as the 'a' arrays: bFullVal<-predict(bVsn,bFull) plot(aFullVal[1:1000,1],bFullVal[1:1000,1]); abline(0,1) I can get back to the scale by subtracting the difference: offset<-mean(exprs(bVsn)-predict(bVsn,b)) bFullVal2<-bFullVal+offset plot(aFullVal[1:1000,1],bFullVal2[1:1000,1]); abline(0,1) But I don't really understand what this offset is or where it comes from (particularly in this toy example where the offset is much larger than any real difference between a and b, though I guess I haven't put in anything that actually needs variance stabilisation). So it would be good to know i) whether subtraction of whatever the offset turns out to be is a reasonable approach (especially when b actually comprises several arrays)? and ii) Is there any less arbitrary way I can calculate values for array b while keeping on the scale of the 'a' arrays (e.g. using parameter values directly)? Any help much appreciated, Chris > sessionInfo() R version 2.5.1 (2007-06-27) i486-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] "tools" "stats" "graphics" "grDevices" "utils" "datasets" [7] "methods" "base" other attached packages: vsn limma affy affyio Biobase "2.2.0" "2.10.5" "1.14.2" "1.4.1" "1.14.1" -- ---------------------------------------------------------------------- -- Dr Christopher Knight Manchester Interdisciplinary Biocentre room 2.001 The University of Manchester Tel: +44 (0)161 3065138 131 Princess Street Fax: +44 (0)161 3064556 Manchester M1 7DN chris.knight at manchester.ac.uk UK www.dbkgroup.org/MCISB/people/knight/ ` ? . ,,><(((?>

Microarray affy vsn limma affyio Microarray affy vsn limma affyio • 871 views

ADD COMMENT • link updated 16.6 years ago by Wolfgang Huber ★ 13k • written 16.6 years ago by chrisk ▴ 10

score 0 · Answer 1 · 2007-10-03

Dear Chris, thank you for this very useful feedback! Indeed you have discovered an oversight in the "predict" function, which led to wrong results when the fit object was previously obtained from a "by-reference" fit (I had never had an instance of this use-case so far....) I have adjusted this in version 3.3.1 of the package, which is posted here: http://www.ebi.ac.uk/~huber/pub I need to see whether it can still be included in the BioC 2.1 release, otherwise it will shortly be in the new devel branch for 2.2. There is also a little script, chris.R, which (afaIu) recapitulates the synthetic data example from your post, but with real data. CCl4 can be obtained with "biocLite". And its output is in the PNG file. Please have a look whether this now fixes your problem! > sessionInfo() R version 2.6.0 RC (2007-10-01 r43050) i686-pc-linux-gnu attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] CCl4_1.0.6 vsn_3.3.1 limma_2.11.14 [4] affy_1.15.12 preprocessCore_0.99.22 affyio_1.5.11 [7] Biobase_1.15.36 fortunes_1.3-3 loaded via a namespace (and not attached): [1] grid_2.6.0 lattice_0.16-5 Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber chrisk ha scritto: > I'm having difficulty using the 'reference' argument of vsn to put data > from a new microarray onto the scale of an existing set of arrays, when > all the arrays are normalised using a shared set of controls. > > I think it's not understanding the way offsets are handled- predicted > values for the data used to create a vsn object are different from the > values stored in that vsn object when a reference is used. e.g. if I > have data from 2 arrays in 'a' and want to put array b back onto their > scale, this is what I'm doing: > > library(vsn) > set.seed(214) > vals<-runif(1000) > a<-matrix(rep(vals,2)+0.1*rnorm(2000),1000,2) > b<-vals+0.1*rnorm(1000) > aVsn<-vsn2(a) > bVsn<-vsn2(b,reference=aVsn) > > the values stored in bVsn are now on the same scale as the 'a' arrays: > > plot(exprs(aVsn)[,2],exprs(bVsn)); abline(0,1) > > however, the predictions from bVsn, using the data b are offset from > these values: > > plot(exprs(bVsn),predict(bVsn,b)); abline(0,1) > > This is an issue when these comparable spots are only a reference set of > probes for a larger array: > > aFull<-rbind(a,matrix(runif(20000),10000,2)) > bFull<-c(b,runif(10000)) > > I've been calculating values for the 'a' arrays using: > > aFullVal<-predict(aVsn,aFull) > > but if I use the same approach for the b array I cease to be on the same > scale as the 'a' arrays: > > bFullVal<-predict(bVsn,bFull) > > plot(aFullVal[1:1000,1],bFullVal[1:1000,1]); abline(0,1) > > I can get back to the scale by subtracting the difference: > > offset<-mean(exprs(bVsn)-predict(bVsn,b)) > bFullVal2<-bFullVal+offset > plot(aFullVal[1:1000,1],bFullVal2[1:1000,1]); abline(0,1) > > But I don't really understand what this offset is or where it comes from > (particularly in this toy example where the offset is much larger than > any real difference between a and b, though I guess I haven't put in > anything that actually needs variance stabilisation). > > So it would be good to know i) whether subtraction of whatever the > offset turns out to be is a reasonable approach (especially when b > actually comprises several arrays)? and ii) Is there any less arbitrary > way I can calculate values for array b while keeping on the scale of the > 'a' arrays (e.g. using parameter values directly)? > > Any help much appreciated, > > Chris > >> sessionInfo() > R version 2.5.1 (2007-06-27) > i486-pc-linux-gnu > > locale: > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_G B.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF -8;LC_IDENTIFICATION=C > > attached base packages: > [1] "tools" "stats" "graphics" "grDevices" "utils" "datasets" > [7] "methods" "base" > > other attached packages: > vsn limma affy affyio Biobase > "2.2.0" "2.10.5" "1.14.2" "1.4.1" "1.14.1"