Vsn transformation repeatability
1
0
Entering edit mode
@paolo-innocenti-2191
Last seen 9.6 years ago
Hi all, We are attempting to identify significantly differentially expressed genes between two treatments of female Drosophila melanogaster, using results from Affymetrix Drosophila2 microarray chip hybridization (8 chips, 4 per treatment). The end result is that the number of significantly differentially expressed genes for a particular FDR changes with each run, despite the input data and the program code remaining unchanged. We are using ?vsnrma? for pre-processing, which utilizes a search algorithm to find the maximum likelihood transformed values. The procedure produces slightly different calibrated GLOG2-transformed expression values on each run, despite the untransformed expression values and the R code being identical each time. Our first guess is that the search algorithm is unable to locate the likelihood peak and settles on a slightly different set of transformed values each time. Is this a possibility? If so, is there anything we can do to fix it? E.g. increase the number of iterations, or modify the search algorithm. Many thanks in advance. Paolo Innocenti > library(affy) Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. > library(vsn) Loading required package: lattice Loading required package: limma > data <- ReadAffy(celfile.path="./celfiles") > sampleNames(data) <- c("z1","z2","z3","z4","m1","m2","m3","m4") > eset1 <- vsnrma(data) 100% done.4 x 8 matrix (1 stratum). 0% done. Please use 'meanSdPlot' to verify the fit. Calculating Expression > eset2 <- vsnrma(data) 100% done.4 x 8 matrix (1 stratum). 0% done. Please use 'meanSdPlot' to verify the fit. Calculating Expression > exprs(eset1)[1,] z1 z2 z3 z4 m1 m2 m3 m4 12.52630 12.23441 12.54579 12.35582 12.45190 12.35984 12.33933 12.35031 > exprs(eset2)[1,] z1 z2 z3 z4 m1 m2 m3 m4 12.52422 12.24059 12.53899 12.35949 12.44828 12.35460 12.34465 12.35277 > head(exprs(eset1))==head(exprs(eset2)) z1 z2 z3 z4 m1 m2 m3 m4 1616608_a_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622892_s_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622893_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622894_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622895_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622896_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > sessionInfo() R version 2.8.0 (2008-10-20) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] drosophila2cdf_2.3.0 vsn_3.8.0 limma_2.16.2 [4] lattice_0.17-15 affy_1.20.0 Biobase_2.2.0 loaded via a namespace (and not attached): [1] affyio_1.10.0 grid_2.8.0 preprocessCore_1.4.0 -- Paolo Innocenti Department of Animal Ecology, EBC Uppsala University Norbyv?gen 18D 75236 Uppsala, Sweden
Microarray Drosophila melanogaster drosophila2 Microarray Drosophila melanogaster drosophila2 • 1.1k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 11 days ago
EMBL European Molecular Biology Laborat…
Dear Paolo, thank you for your feedback! There are three issues: 1. You say "number of ... differentially expressed genes for a particular FDR changes with each run". By how much? The variation should be tiny (negligible?), otherwise I would be worried about the stability of your method to compute differential expressedness, and FDR. 2. "vsnrma" calls the "vsn2" method for "AffyBatch" objects, which by default estimates the transformation from a random subsample of the microarray features, see showMethods(f="vsn2", classes="AffyBatch", includeDefs=TRUE) Please see the code example below for how to deactivate the subsampling. This should be the only source of (pseudo)randomness, everything else (including the numerical optimisation) should be deterministic. 3. Of course, there is also "set.seed" (which is acceptable if you have convinced yourself about point 1 above.) Best wishes Wolfgang library("vsn") library("affydata") data("Dilution") u1 = vsnrma(Dilution) u2 = vsnrma(Dilution) identical(exprs(u1), exprs(u2)) # [1] FALSE u3 = vsnrma(Dilution, subsample=0) u4 = vsnrma(Dilution, subsample=0) identical(exprs(u3), exprs(u4)) # [1] TRUE sessionInfo() R version 2.9.0 Under development (unstable) (2008-10-28 r46793) x86_64-unknown-linux-gnu locale: LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT .UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] hgu95av2cdf_2.2.0 affydata_1.11.3 vsn_3.9.2 [4] limma_2.14.7 affy_1.18.2 preprocessCore_1.2.1 [7] affyio_1.11.0 Biobase_2.0.1 lattice_0.17-15 [10] fortunes_1.3-5 loaded via a namespace (and not attached): [1] grid_2.9.0 ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 29/10/2008 10:20 Paolo Innocenti scripsit > Hi all, > > We are attempting to identify significantly differentially expressed > genes between two treatments of female Drosophila melanogaster, using > results from Affymetrix Drosophila2 microarray chip hybridization (8 > chips, 4 per treatment). The end result is that the number of > significantly differentially expressed genes for a particular FDR > changes with each run, despite the input data and the program code > remaining unchanged. > > We are using ?vsnrma? for pre-processing, which utilizes a search > algorithm to find the maximum likelihood transformed values. The > procedure produces slightly different calibrated GLOG2-transformed > expression values on each run, despite the untransformed expression > values and the R code being identical each time. Our first guess is that > the search algorithm is unable to locate the likelihood peak and settles > on a slightly different set of transformed values each time. Is this a > possibility? If so, is there anything we can do to fix it? E.g. increase > the number of iterations, or modify the search algorithm. > > Many thanks in advance. > Paolo Innocenti > > >> library(affy) > Loading required package: Biobase > Loading required package: tools > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > >> library(vsn) > Loading required package: lattice > Loading required package: limma >> data <- ReadAffy(celfile.path="./celfiles") >> sampleNames(data) <- c("z1","z2","z3","z4","m1","m2","m3","m4") >> eset1 <- vsnrma(data) > 100% done.4 x 8 matrix (1 stratum). 0% done. > Please use 'meanSdPlot' to verify the fit. > Calculating Expression >> eset2 <- vsnrma(data) > 100% done.4 x 8 matrix (1 stratum). 0% done. > Please use 'meanSdPlot' to verify the fit. > Calculating Expression >> exprs(eset1)[1,] > z1 z2 z3 z4 m1 m2 m3 m4 > 12.52630 12.23441 12.54579 12.35582 12.45190 12.35984 12.33933 12.35031 >> exprs(eset2)[1,] > z1 z2 z3 z4 m1 m2 m3 m4 > 12.52422 12.24059 12.53899 12.35949 12.44828 12.35460 12.34465 12.35277 >> head(exprs(eset1))==head(exprs(eset2)) > z1 z2 z3 z4 m1 m2 m3 m4 > 1616608_a_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622892_s_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622893_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622894_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622895_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622896_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> sessionInfo() > R version 2.8.0 (2008-10-20) > i686-pc-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] drosophila2cdf_2.3.0 vsn_3.8.0 limma_2.16.2 > [4] lattice_0.17-15 affy_1.20.0 Biobase_2.2.0 > > loaded via a namespace (and not attached): > [1] affyio_1.10.0 grid_2.8.0 preprocessCore_1.4.0 > >
ADD COMMENT
0
Entering edit mode
Wolfgang Huber wrote: > 1. You say "number of ... differentially expressed genes for a > particular FDR changes with each run". By how much? The variation should > be tiny (negligible?), otherwise I would be worried about the stability > of your method to compute differential expressedness, and FDR. I use the simplest possible approach in limma (one factor with 2 levels) following the userguide. The problem seems to be that the distribution of adjusted p.values is rather odd: instead of having a peak at 0, I have a clear peak between 0.05 and 0.1, so that a minimum difference in the normalization have a rather big effect on the number of my differentially expressed genes (I fixed the cutoff at 0.1). The difference is the order of 5% of d.e. probesets found (range between 1700 and 1900 on a total of 18000). > Please see the code example below for how to deactivate the subsampling. Deactivating subsampling fix the problem! Thank you very much! Best, paolo -- Paolo Innocenti Department of Animal Ecology, EBC Uppsala University Norbyv?gen 18D 75236 Uppsala, Sweden
ADD REPLY

Login before adding your answer.

Traffic: 837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6