Inconsistent coefficient values

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi, I have a problem with using 'limma' when I'm analysing some microarray data. If I run the below code WITHOUT setting a seed, I get slightly different values for the coefficients each time it's run; however this problem does not occur if I do set one (e.g. set.seed(1223762671)) :- raw.data <-ReadAffy( celfile.path="CEL directory" ) normalised.data <-vsnrma(raw.data) transfect.lmFit <-lmFit( normalised.data, design.matrix ) cont.lmFit <-contrasts.fit(transfect.lmFit, cont.matrix) i.e. the values in cont.lmFit$coefficients are altered from one R session to another. Please could anyone help with this? Many thanks, Richard. -- output of sessionInfo(): > sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 [7] affy_1.26.1 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 > cont.matrix Contrasts Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) C.GFP.24 1 C.GFP.48 1 C.GFP.72 1 mock.24 -1 mock.48 -1 mock.72 -1 myc.24 0 myc.48 0 myc.72 0 N.GFP.24 0 N.GFP.48 0 N.GFP.72 0 untransfected.0 0 Contrasts Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) C.GFP.24 0 C.GFP.48 0 C.GFP.72 0 mock.24 -1 mock.48 -1 mock.72 -1 myc.24 0 myc.48 0 myc.72 0 N.GFP.24 1 N.GFP.48 1 N.GFP.72 1 untransfected.0 0 Contrasts Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) C.GFP.24 0 C.GFP.48 0 C.GFP.72 0 mock.24 -1 mock.48 -1 mock.72 -1 myc.24 1 myc.48 1 myc.72 1 N.GFP.24 0 N.GFP.48 0 N.GFP.72 0 untransfected.0 0 -- Sent via the guest posting facility at bioconductor.org.

• 972 views

ADD COMMENT • link updated 12.2 years ago by Vincent J. Carey, Jr. 6.7k • written 12.2 years ago by Guest User ★ 13k

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

I believe you need to have a look at the documentation of vsn2 subsample: Integer of length 1. If its value is greater than 0, the model parameters are estimated from a subsample of the data of size 'subsample' only, yet the fitted transformation is then applied to all data. For large datasets, this can substantially reduce the CPU time and memory consumption at a negligible loss of precision. Note that the 'AffyBatch' method of 'vsn2' sets a value of '30000' for this parameter if it is missing from the function call - which is different from the behaviour of the other methods. On Fri, Feb 10, 2012 at 5:22 AM, Richard Coulson [guest] < guest@bioconductor.org> wrote: > > Hi, > > I have a problem with using 'limma' when I'm analysing some microarray > data. If I run the below code WITHOUT setting a seed, I get slightly > different values for the coefficients each time it's run; however this > problem does not occur if I do set one (e.g. set.seed(1223762671)) :- > > raw.data <-ReadAffy( celfile.path="CEL directory" ) > normalised.data <-vsnrma(raw.data) > > transfect.lmFit <-lmFit( normalised.data, design.matrix ) > cont.lmFit <-contrasts.fit(transfect.lmFit, cont.matrix) > > i.e. the values in cont.lmFit$coefficients are altered from one R session > to another. > > Please could anyone help with this? > > Many thanks, > Richard. > > > -- output of sessionInfo(): > > > sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 > [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 > [7] affy_1.26.1 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 > [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 > > > cont.matrix > Contrasts > Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) > C.GFP.24 1 > C.GFP.48 1 > C.GFP.72 1 > mock.24 -1 > mock.48 -1 > mock.72 -1 > myc.24 0 > myc.48 0 > myc.72 0 > N.GFP.24 0 > N.GFP.48 0 > N.GFP.72 0 > untransfected.0 0 > Contrasts > Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) > C.GFP.24 0 > C.GFP.48 0 > C.GFP.72 0 > mock.24 -1 > mock.48 -1 > mock.72 -1 > myc.24 0 > myc.48 0 > myc.72 0 > N.GFP.24 1 > N.GFP.48 1 > N.GFP.72 1 > untransfected.0 0 > Contrasts > Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) > C.GFP.24 0 > C.GFP.48 0 > C.GFP.72 0 > mock.24 -1 > mock.48 -1 > mock.72 -1 > myc.24 1 > myc.48 1 > myc.72 1 > N.GFP.24 0 > N.GFP.48 0 > N.GFP.72 0 > untransfected.0 0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 12.2 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Hi Richard Vince is spot on. Do you have a reason to worry that the differences in results are non-negligible? Best wishes Wolfgang Vincent Carey scripsit 02/10/2012 01:44 PM: > I believe you need to have a look at the documentation of vsn2 > > subsample: Integer of length 1. If its value is greater than 0, the > model parameters are estimated from a subsample of the data > of size 'subsample' only, yet the fitted transformation is > then applied to all data. For large datasets, this can > substantially reduce the CPU time and memory consumption at a > negligible loss of precision. Note that the 'AffyBatch' > method of 'vsn2' sets a value of '30000' for this parameter > if it is missing from the function call - which is different > from the behaviour of the other methods. > > > On Fri, Feb 10, 2012 at 5:22 AM, Richard Coulson [guest]< > guest at bioconductor.org> wrote: > >> >> Hi, >> >> I have a problem with using 'limma' when I'm analysing some microarray >> data. If I run the below code WITHOUT setting a seed, I get slightly >> different values for the coefficients each time it's run; however this >> problem does not occur if I do set one (e.g. set.seed(1223762671)) :- >> >> raw.data<-ReadAffy( celfile.path="CEL directory" ) >> normalised.data<-vsnrma(raw.data) >> >> transfect.lmFit<-lmFit( normalised.data, design.matrix ) >> cont.lmFit<-contrasts.fit(transfect.lmFit, cont.matrix) >> >> i.e. the values in cont.lmFit$coefficients are altered from one R session >> to another. >> >> Please could anyone help with this? >> >> Many thanks, >> Richard. >> >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 >> [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 >> [7] affy_1.26.1 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 >> [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 >> >>> cont.matrix >> Contrasts >> Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 1 >> C.GFP.48 1 >> C.GFP.72 1 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> Contrasts >> Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 1 >> N.GFP.48 1 >> N.GFP.72 1 >> untransfected.0 0 >> Contrasts >> Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 1 >> myc.48 1 >> myc.72 1 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi, Many thanks - setting subsample to 0 seems to have fixed the problem, though it does slow down the normalisation step a bit. Thanks again, Richard. On 10/02/12 12:44, Vincent Carey wrote: > I believe you need to have a look at the documentation of vsn2 > > subsample: Integer of length 1. If its value is greater than 0, the > model parameters are estimated from a subsample of the data > of size 'subsample' only, yet the fitted transformation is > then applied to all data. For large datasets, this can > substantially reduce the CPU time and memory consumption at a > negligible loss of precision. Note that the 'AffyBatch' > method of 'vsn2' sets a value of '30000' for this parameter > if it is missing from the function call - which is different > from the behaviour of the other methods. > > > On Fri, Feb 10, 2012 at 5:22 AM, Richard Coulson [guest]< > guest at bioconductor.org> wrote: > >> Hi, >> >> I have a problem with using 'limma' when I'm analysing some microarray >> data. If I run the below code WITHOUT setting a seed, I get slightly >> different values for the coefficients each time it's run; however this >> problem does not occur if I do set one (e.g. set.seed(1223762671)) :- >> >> raw.data<-ReadAffy( celfile.path="CEL directory" ) >> normalised.data<-vsnrma(raw.data) >> >> transfect.lmFit<-lmFit( normalised.data, design.matrix ) >> cont.lmFit<-contrasts.fit(transfect.lmFit, cont.matrix) >> >> i.e. the values in cont.lmFit$coefficients are altered from one R session >> to another. >> >> Please could anyone help with this? >> >> Many thanks, >> Richard. >> >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 >> [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 >> [7] affy_1.26.1 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 >> [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 >> >>> cont.matrix >> Contrasts >> Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 1 >> C.GFP.48 1 >> C.GFP.72 1 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> Contrasts >> Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 1 >> N.GFP.48 1 >> N.GFP.72 1 >> untransfected.0 0 >> Contrasts >> Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 1 >> myc.48 1 >> myc.72 1 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>

ADD REPLY • link 12.2 years ago Richard Coulson ▴ 10

Login before adding your answer.