Inconsistent coefficient values
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.4 years ago
Hi, I have a problem with using 'limma' when I'm analysing some microarray data. If I run the below code WITHOUT setting a seed, I get slightly different values for the coefficients each time it's run; however this problem does not occur if I do set one (e.g. set.seed(1223762671)) :- raw.data <-ReadAffy( celfile.path="CEL directory" ) normalised.data <-vsnrma(raw.data) transfect.lmFit <-lmFit( normalised.data, design.matrix ) cont.lmFit <-contrasts.fit(transfect.lmFit, cont.matrix) i.e. the values in cont.lmFit$coefficients are altered from one R session to another. Please could anyone help with this? Many thanks, Richard. -- output of sessionInfo(): > sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 [7] affy_1.26.1 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 > cont.matrix Contrasts Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) C.GFP.24 1 C.GFP.48 1 C.GFP.72 1 mock.24 -1 mock.48 -1 mock.72 -1 myc.24 0 myc.48 0 myc.72 0 N.GFP.24 0 N.GFP.48 0 N.GFP.72 0 untransfected.0 0 Contrasts Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) C.GFP.24 0 C.GFP.48 0 C.GFP.72 0 mock.24 -1 mock.48 -1 mock.72 -1 myc.24 0 myc.48 0 myc.72 0 N.GFP.24 1 N.GFP.48 1 N.GFP.72 1 untransfected.0 0 Contrasts Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) C.GFP.24 0 C.GFP.48 0 C.GFP.72 0 mock.24 -1 mock.48 -1 mock.72 -1 myc.24 1 myc.48 1 myc.72 1 N.GFP.24 0 N.GFP.48 0 N.GFP.72 0 untransfected.0 0 -- Sent via the guest posting facility at bioconductor.org.
• 1.1k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 19 days ago
United States
I believe you need to have a look at the documentation of vsn2 subsample: Integer of length 1. If its value is greater than 0, the model parameters are estimated from a subsample of the data of size 'subsample' only, yet the fitted transformation is then applied to all data. For large datasets, this can substantially reduce the CPU time and memory consumption at a negligible loss of precision. Note that the 'AffyBatch' method of 'vsn2' sets a value of '30000' for this parameter if it is missing from the function call - which is different from the behaviour of the other methods. On Fri, Feb 10, 2012 at 5:22 AM, Richard Coulson [guest] < guest@bioconductor.org> wrote: > > Hi, > > I have a problem with using 'limma' when I'm analysing some microarray > data. If I run the below code WITHOUT setting a seed, I get slightly > different values for the coefficients each time it's run; however this > problem does not occur if I do set one (e.g. set.seed(1223762671)) :- > > raw.data <-ReadAffy( celfile.path="CEL directory" ) > normalised.data <-vsnrma(raw.data) > > transfect.lmFit <-lmFit( normalised.data, design.matrix ) > cont.lmFit <-contrasts.fit(transfect.lmFit, cont.matrix) > > i.e. the values in cont.lmFit$coefficients are altered from one R session > to another. > > Please could anyone help with this? > > Many thanks, > Richard. > > > -- output of sessionInfo(): > > > sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 > [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 > [7] affy_1.26.1 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 > [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 > > > cont.matrix > Contrasts > Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) > C.GFP.24 1 > C.GFP.48 1 > C.GFP.72 1 > mock.24 -1 > mock.48 -1 > mock.72 -1 > myc.24 0 > myc.48 0 > myc.72 0 > N.GFP.24 0 > N.GFP.48 0 > N.GFP.72 0 > untransfected.0 0 > Contrasts > Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) > C.GFP.24 0 > C.GFP.48 0 > C.GFP.72 0 > mock.24 -1 > mock.48 -1 > mock.72 -1 > myc.24 0 > myc.48 0 > myc.72 0 > N.GFP.24 1 > N.GFP.48 1 > N.GFP.72 1 > untransfected.0 0 > Contrasts > Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) > C.GFP.24 0 > C.GFP.48 0 > C.GFP.72 0 > mock.24 -1 > mock.48 -1 > mock.72 -1 > myc.24 1 > myc.48 1 > myc.72 1 > N.GFP.24 0 > N.GFP.48 0 > N.GFP.72 0 > untransfected.0 0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Richard Vince is spot on. Do you have a reason to worry that the differences in results are non-negligible? Best wishes Wolfgang Vincent Carey scripsit 02/10/2012 01:44 PM: > I believe you need to have a look at the documentation of vsn2 > > subsample: Integer of length 1. If its value is greater than 0, the > model parameters are estimated from a subsample of the data > of size 'subsample' only, yet the fitted transformation is > then applied to all data. For large datasets, this can > substantially reduce the CPU time and memory consumption at a > negligible loss of precision. Note that the 'AffyBatch' > method of 'vsn2' sets a value of '30000' for this parameter > if it is missing from the function call - which is different > from the behaviour of the other methods. > > > On Fri, Feb 10, 2012 at 5:22 AM, Richard Coulson [guest]< > guest at bioconductor.org> wrote: > >> >> Hi, >> >> I have a problem with using 'limma' when I'm analysing some microarray >> data. If I run the below code WITHOUT setting a seed, I get slightly >> different values for the coefficients each time it's run; however this >> problem does not occur if I do set one (e.g. set.seed(1223762671)) :- >> >> raw.data<-ReadAffy( celfile.path="CEL directory" ) >> normalised.data<-vsnrma(raw.data) >> >> transfect.lmFit<-lmFit( normalised.data, design.matrix ) >> cont.lmFit<-contrasts.fit(transfect.lmFit, cont.matrix) >> >> i.e. the values in cont.lmFit$coefficients are altered from one R session >> to another. >> >> Please could anyone help with this? >> >> Many thanks, >> Richard. >> >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 >> [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 >> [7] affy_1.26.1 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 >> [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 >> >>> cont.matrix >> Contrasts >> Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 1 >> C.GFP.48 1 >> C.GFP.72 1 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> Contrasts >> Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 1 >> N.GFP.48 1 >> N.GFP.72 1 >> untransfected.0 0 >> Contrasts >> Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 1 >> myc.48 1 >> myc.72 1 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY
0
Entering edit mode
Hi, Many thanks - setting subsample to 0 seems to have fixed the problem, though it does slow down the normalisation step a bit. Thanks again, Richard. On 10/02/12 12:44, Vincent Carey wrote: > I believe you need to have a look at the documentation of vsn2 > > subsample: Integer of length 1. If its value is greater than 0, the > model parameters are estimated from a subsample of the data > of size 'subsample' only, yet the fitted transformation is > then applied to all data. For large datasets, this can > substantially reduce the CPU time and memory consumption at a > negligible loss of precision. Note that the 'AffyBatch' > method of 'vsn2' sets a value of '30000' for this parameter > if it is missing from the function call - which is different > from the behaviour of the other methods. > > > On Fri, Feb 10, 2012 at 5:22 AM, Richard Coulson [guest]< > guest at bioconductor.org> wrote: > >> Hi, >> >> I have a problem with using 'limma' when I'm analysing some microarray >> data. If I run the below code WITHOUT setting a seed, I get slightly >> different values for the coefficients each time it's run; however this >> problem does not occur if I do set one (e.g. set.seed(1223762671)) :- >> >> raw.data<-ReadAffy( celfile.path="CEL directory" ) >> normalised.data<-vsnrma(raw.data) >> >> transfect.lmFit<-lmFit( normalised.data, design.matrix ) >> cont.lmFit<-contrasts.fit(transfect.lmFit, cont.matrix) >> >> i.e. the values in cont.lmFit$coefficients are altered from one R session >> to another. >> >> Please could anyone help with this? >> >> Many thanks, >> Richard. >> >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] hugene11stv1cdf_1.26.0 limma_3.4.4 vsn_3.16.0 >> [4] affyPLM_1.24.0 preprocessCore_1.10.0 gcrma_2.20.0 >> [7] affy_1.26.1 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 Biostrings_2.16.9 grid_2.12.2 IRanges_1.6.8 >> [5] lattice_0.19-17 splines_2.12.2 tools_2.12.2 >> >>> cont.matrix >> Contrasts >> Levels (C.GFP.24+C.GFP.48+C.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 1 >> C.GFP.48 1 >> C.GFP.72 1 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> Contrasts >> Levels (N.GFP.24+N.GFP.48+N.GFP.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 0 >> myc.48 0 >> myc.72 0 >> N.GFP.24 1 >> N.GFP.48 1 >> N.GFP.72 1 >> untransfected.0 0 >> Contrasts >> Levels (myc.24+myc.48+myc.72)-(mock.24+mock.48+mock.72) >> C.GFP.24 0 >> C.GFP.48 0 >> C.GFP.72 0 >> mock.24 -1 >> mock.48 -1 >> mock.72 -1 >> myc.24 1 >> myc.48 1 >> myc.72 1 >> N.GFP.24 0 >> N.GFP.48 0 >> N.GFP.72 0 >> untransfected.0 0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>
ADD REPLY

Login before adding your answer.

Traffic: 571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6