problem regarding runtime of pamr.cv

0

Entering edit mode

ALok ▴ 170

@alok-2917

Last seen 11.4 years ago

Hi All I am using pamr.cv to cross validate the NSC classifier, function runs smoothly with no error, but unnecessarily it is printing each Fold index as Fold 1 :123456789101112131415161718192021222324252627282930 Fold 2 :123456789101112131415161718192021222324252627282930 which takes approx *60 sec* for each *10 fold CV *runs with *4GB* memory while runnning 10K simulations this factor will increase 10K times i.e approx *1.6 hrs * i have to do *10,000* iteration for each 10 fold CV for several datasets and in each run it takes more than *15 hrs* just to print the Fold index , which is not needed by any user in any case . My question is that Is there any way to stop printing Fold index on screen , so that simulation time can be reduced Thanks in advance Regards ALok -- Ph.D scholar Centre of Computational Biology and Bioinformatics School of Information Technology JNU New Delhi [[alternative HTML version deleted]]

pamr pamr • 1.6k views

ADD COMMENT • link updated 16.9 years ago by James W. MacDonald 68k • written 16.9 years ago by ALok ▴ 170

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 19 hours ago

United States

Hi ALok, I think you are mistaken. It is very unlikely that it would take R 60 seconds to print a number to stdout. Instead, what is happening is the printing of the numbers is from a for() loop (that is within another for() loop) that also runs the crossvalidation. So what takes the 60 seconds is the fitting of the pam model, not the printing of the fold index. The authors have hard-coded this printing into their code, so unless you want to make modifications, you won't be able to stop it. Best, Jim James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 >>> ALok <foralok at="" gmail.com=""> 02/26/09 4:04 PM >>> Hi All I am using pamr.cv to cross validate the NSC classifier, function runs smoothly with no error, but unnecessarily it is printing each Fold index as Fold 1 :123456789101112131415161718192021222324252627282930 Fold 2 :123456789101112131415161718192021222324252627282930 which takes approx *60 sec* for each *10 fold CV *runs with *4GB* memory while runnning 10K simulations this factor will increase 10K times i.e approx *1.6 hrs * i have to do *10,000* iteration for each 10 fold CV for several datasets and in each run it takes more than *15 hrs* just to print the Fold index , which is not needed by any user in any case . My question is that Is there any way to stop printing Fold index on screen , so that simulation time can be reduced Thanks in advance Regards ALok -- Ph.D scholar Centre of Computational Biology and Bioinformatics School of Information Technology JNU New Delhi [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 16.9 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks Vincent and James As James pointed it is the output of two for loop but for every 10 fold validation it prints ((30+6) X 10 )) ~ 360 character which is of no use and takes some time, if we can save that time lets approx 10 sec in each runs than on 1000 simulations everybody can save atleast 2.7 hrs. The similar way as suggested by Vincent ,I modified the code pamr.listgenes.r to stop printing the genelist after extracting the gene list . In case of small subset of genes it doesn't matter but it takes time to print in case of larger set of genes (more than 10K), I tried to find nsccv.R, in my linux machine but it is not there, I found the script pamr which has the structure for fold printing , but could not get success. On Fri, Feb 27, 2009 at 5:16 PM, James MacDonald <jmacdon@med.umich.edu>wrote: > Hi ALok, > > I think you are mistaken. It is very unlikely that it would take R 60 > seconds to print a number to stdout. Instead, what is happening is the > printing of the numbers is from a for() loop (that is within another for() > loop) that also runs the crossvalidation. So what takes the 60 seconds is > the fitting of the pam model, not the printing of the fold index. > > The authors have hard-coded this printing into their code, so unless you > want to make modifications, you won't be able to stop it. > > Best, > > Jim > > > > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > >>> ALok <foralok@gmail.com> 02/26/09 4:04 PM >>> > Hi All > > I am using pamr.cv to cross validate the NSC classifier, > function runs smoothly with no error, > but unnecessarily it is printing each Fold index as > > Fold 1 :123456789101112131415161718192021222324252627282930 > Fold 2 :123456789101112131415161718192021222324252627282930 > > which takes approx *60 sec* for each *10 fold CV *runs with *4GB* memory > while runnning 10K simulations this factor will increase 10K times > i.e approx *1.6 hrs * > > i have to do *10,000* iteration for each 10 fold CV for several datasets > and in each run it takes more than *15 hrs* just to print the Fold index , > which is not needed by any user in any case . > > My question is that > Is there any way to stop printing Fold index on screen , so that simulation > time can be reduced > > > Thanks in advance > > Regards > ALok > > -- > Ph.D scholar > Centre of Computational Biology and Bioinformatics > School of Information Technology > JNU New Delhi > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > -- Ph.D scholar Centre of Computational Biology and Bioinformatics School of Information Technology JNU New Delhi [[alternative HTML version deleted]]

ADD REPLY • link 16.9 years ago ALok ▴ 170

0

Entering edit mode

On Fri, Feb 27, 2009 at 8:17 AM, ALok <foralok@gmail.com> wrote: > Thanks Vincent and James > > As James pointed it is the output of two for loop > but for every 10 fold validation it prints ((30+6) X 10 )) ~ 360 character > which is of no use and takes some time, > if we can save that time lets approx 10 sec in each runs than on 1000 > simulations everybody can save atleast 2.7 hrs. > > The similar way as suggested by Vincent ,I modified the code > pamr.listgenes.r to stop printing the genelist after extracting the gene > list . In case of small subset of genes it doesn't matter > but it takes time to print in case of larger set of genes (more than 10K), > > I tried to find nsccv.R, in my linux machine but it is not there, > I found the script pamr which has the structure for fold printing , but > could not get success. > Alok, Try this: system.time(for(i in 1:100000) {cat("a")}) This will print 100000 "a" characters. On my machine, this takes less than 1 second. The printing is really not the issue. Sean > > On Fri, Feb 27, 2009 at 5:16 PM, James MacDonald <jmacdon@med.umich.edu> >wrote: > > > Hi ALok, > > > > I think you are mistaken. It is very unlikely that it would take R 60 > > seconds to print a number to stdout. Instead, what is happening is the > > printing of the numbers is from a for() loop (that is within another > for() > > loop) that also runs the crossvalidation. So what takes the 60 seconds is > > the fitting of the pam model, not the printing of the fold index. > > > > The authors have hard-coded this printing into their code, so unless you > > want to make modifications, you won't be able to stop it. > > > > Best, > > > > Jim > > > > > > > > James W. MacDonald, M.S. > > Biostatistician > > Hildebrandt Lab > > 8220D MSRB III > > 1150 W. Medical Center Drive > > Ann Arbor MI 48109-0646 > > 734-936-8662 > > >>> ALok <foralok@gmail.com> 02/26/09 4:04 PM >>> > > Hi All > > > > I am using pamr.cv to cross validate the NSC classifier, > > function runs smoothly with no error, > > but unnecessarily it is printing each Fold index as > > > > Fold 1 :123456789101112131415161718192021222324252627282930 > > Fold 2 :123456789101112131415161718192021222324252627282930 > > > > which takes approx *60 sec* for each *10 fold CV *runs with *4GB* memory > > while runnning 10K simulations this factor will increase 10K times > > i.e approx *1.6 hrs * > > > > i have to do *10,000* iteration for each 10 fold CV for several datasets > > and in each run it takes more than *15 hrs* just to print the Fold index > , > > which is not needed by any user in any case . > > > > My question is that > > Is there any way to stop printing Fold index on screen , so that > simulation > > time can be reduced > > > > > > Thanks in advance > > > > Regards > > ALok > > > > -- > > Ph.D scholar > > Centre of Computational Biology and Bioinformatics > > School of Information Technology > > JNU New Delhi > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > ********************************************************** > > Electronic Mail is not secure, may not be read every day, and should not > be > > used for urgent or sensitive issues > > > > > > -- > Ph.D scholar > Centre of Computational Biology and Bioinformatics > School of Information Technology > JNU New Delhi > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 16.9 years ago Sean Davis 21k

0

Entering edit mode

Thanks Sean I think you are right but since i am simultaneously analyzing 12 data set with approx 100 X 100X 10 simulations that's why i want to save my time anyway i got sucess up to some extent. ALok On Fri, Feb 27, 2009 at 7:39 PM, Sean Davis <seandavi@gmail.com> wrote: > > > On Fri, Feb 27, 2009 at 8:17 AM, ALok <foralok@gmail.com> wrote: > >> Thanks Vincent and James >> >> As James pointed it is the output of two for loop >> but for every 10 fold validation it prints ((30+6) X 10 )) ~ 360 character >> which is of no use and takes some time, >> if we can save that time lets approx 10 sec in each runs than on 1000 >> simulations everybody can save atleast 2.7 hrs. >> >> The similar way as suggested by Vincent ,I modified the code >> pamr.listgenes.r to stop printing the genelist after extracting the gene >> list . In case of small subset of genes it doesn't matter >> but it takes time to print in case of larger set of genes (more than 10K), >> >> I tried to find nsccv.R, in my linux machine but it is not there, >> I found the script pamr which has the structure for fold printing , but >> could not get success. >> > > Alok, > > Try this: > > system.time(for(i in 1:100000) {cat("a")}) > > This will print 100000 "a" characters. On my machine, this takes less than > 1 second. The printing is really not the issue. > > Sean > > >> >> On Fri, Feb 27, 2009 at 5:16 PM, James MacDonald <jmacdon@med.umich.edu>> >wrote: >> >> > Hi ALok, >> > >> > I think you are mistaken. It is very unlikely that it would take R 60 >> > seconds to print a number to stdout. Instead, what is happening is the >> > printing of the numbers is from a for() loop (that is within another >> for() >> > loop) that also runs the crossvalidation. So what takes the 60 seconds >> is >> > the fitting of the pam model, not the printing of the fold index. >> > >> > The authors have hard-coded this printing into their code, so unless you >> > want to make modifications, you won't be able to stop it. >> > >> > Best, >> > >> > Jim >> > >> > >> > >> > James W. MacDonald, M.S. >> > Biostatistician >> > Hildebrandt Lab >> > 8220D MSRB III >> > 1150 W. Medical Center Drive >> > Ann Arbor MI 48109-0646 >> > 734-936-8662 >> > >>> ALok <foralok@gmail.com> 02/26/09 4:04 PM >>> >> > Hi All >> > >> > I am using pamr.cv to cross validate the NSC classifier, >> > function runs smoothly with no error, >> > but unnecessarily it is printing each Fold index as >> > >> > Fold 1 :123456789101112131415161718192021222324252627282930 >> > Fold 2 :123456789101112131415161718192021222324252627282930 >> > >> > which takes approx *60 sec* for each *10 fold CV *runs with *4GB* memory >> > while runnning 10K simulations this factor will increase 10K times >> > i.e approx *1.6 hrs * >> > >> > i have to do *10,000* iteration for each 10 fold CV for several datasets >> > and in each run it takes more than *15 hrs* just to print the Fold index >> , >> > which is not needed by any user in any case . >> > >> > My question is that >> > Is there any way to stop printing Fold index on screen , so that >> simulation >> > time can be reduced >> > >> > >> > Thanks in advance >> > >> > Regards >> > ALok >> > >> > -- >> > Ph.D scholar >> > Centre of Computational Biology and Bioinformatics >> > School of Information Technology >> > JNU New Delhi >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > ********************************************************** >> > Electronic Mail is not secure, may not be read every day, and should not >> be >> > used for urgent or sensitive issues >> > >> >> >> >> -- >> Ph.D scholar >> Centre of Computational Biology and Bioinformatics >> School of Information Technology >> JNU New Delhi >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Ph.D scholar Centre of Computational Biology and Bioinformatics School of Information Technology JNU New Delhi [[alternative HTML version deleted]]

ADD REPLY • link 16.9 years ago ALok ▴ 170

Login before adding your answer.