memory bottleneck on Linux
2
0
Entering edit mode
Tapan Mehta ▴ 270
@tapan-mehta-165
Last seen 10.3 years ago
Hello, I am trying to run a task of 100 CEL files each of 10 MB on Linux based Bewoulf cluster with each node of 4 GB. I am trying to use the mas5 method in the affy library. However I am unable to finish it gives memory problems. I am unable to increase the memory as the function of memory in R is only for Windows( as posted on the R mailing list). Please could anybody guide me in this problem? Regards, Tapan Mehta
• 1.5k views
ADD COMMENT
0
Entering edit mode
A.J. Rossini ▴ 810
@aj-rossini-209
Last seen 10.3 years ago
Tapan Mehta <tapmehta@yahoo.com> writes: > I am trying to run a task of 100 CEL files each of 10 > MB on Linux based Bewoulf cluster with each node of 4 > GB. I am trying to use the mas5 method in the affy > library. However I am unable to finish it gives memory > problems. I am unable to increase the memory as the > function of memory in R is only for Windows( as posted > on the R mailing list). Please could anybody guide me > in this problem? Have you modified the affy library to actually use the cluster? If not, why do you think the cluster will help? I'm assuming that you are simply running on a single node with 4Gb ram, which might not be enough. best, -tony -- A.J. Rossini / rossini@u.washington.edu / rossini@scharp.org http://software.biostat.washington.edu/ UNTIL IT MOVES IN JULY. Biomedical and Health Informatics, University of Washington Biostatistics, HVTN/SCHARP, Fred Hutchinson Cancer Research Center. FHCRC: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
ADD COMMENT
0
Entering edit mode
Thanks a lot for your reply. No ihaven't modified the affy library for cluster because I thought 4 GB RAM should suffice. Is there anything in particular available for parallel computing in the Bioconductor project. Regards, Tapan Mehta --- "A.J. Rossini" <rossini@blindglobe.net> wrote: > Tapan Mehta <tapmehta@yahoo.com> writes: > > > I am trying to run a task of 100 CEL files each of > 10 > > MB on Linux based Bewoulf cluster with each node > of 4 > > GB. I am trying to use the mas5 method in the affy > > library. However I am unable to finish it gives > memory > > problems. I am unable to increase the memory as > the > > function of memory in R is only for Windows( as > posted > > on the R mailing list). Please could anybody guide > me > > in this problem? > > Have you modified the affy library to actually use > the cluster? > > If not, why do you think the cluster will help? I'm > assuming that you > are simply running on a single node with 4Gb ram, > which might not be > enough. > > best, > -tony > > > -- > A.J. Rossini / rossini@u.washington.edu / > rossini@scharp.org > http://software.biostat.washington.edu/ UNTIL IT > MOVES IN JULY. > Biomedical and Health Informatics, University of > Washington > Biostatistics, HVTN/SCHARP, Fred Hutchinson Cancer > Research Center. > FHCRC: 206-667-7025 (fax=4812)|Voicemail is pretty > sketchy/use Email > > CONFIDENTIALITY NOTICE: This e-mail message and any > attachments may be > confidential and privileged. If you received this > message in error, > please destroy it and notify the sender. Thank you.
ADD REPLY
0
Entering edit mode
Ben Bolstad ★ 1.1k
@ben-bolstad-93
Last seen 10.3 years ago
I am surprised that you can not get 100 CEL files to run successfully. 4 GB (on a single machine) should be more than sufficient to do this. Because the MAS5 method is not a multi-chip method, you could compute the expression measures in smaller groups (say groups of 25 for example). Join the expression measures together for each group and then scale them appropriately. If you are using the mas5() function from the affy package remember to set normalize=FALSE when doing each of the groups (so as not to scale the groups individually). Ben On Wed, 2003-07-02 at 09:52, Tapan Mehta wrote: > Hello, > > I am trying to run a task of 100 CEL files each of 10 > MB on Linux based Bewoulf cluster with each node of 4 > GB. I am trying to use the mas5 method in the affy > library. However I am unable to finish it gives memory > problems. I am unable to increase the memory as the > function of memory in R is only for Windows( as posted > on the R mailing list). Please could anybody guide me > in this problem? > > Regards, > > Tapan Mehta > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ben Bolstad <bolstad@stat.berkeley.edu> http://www.stat.berkeley.edu/~bolstad
ADD COMMENT
0
Entering edit mode
I tried using rma instead of mas5 to check whether the memory restrictions still apply. Unfortunately they do apply and right now I am finding it impossible to process all the 100 files together. However mas5 works for smaller batch of 20 or 25 chips. However it would be great to figure out a solution that can process 100 files since for rma I need to process all the 100 files in a single batch. Are these algorithms(mas5,rma) parallelizable or can the programs be modified for making the optimum use of the cluster? --- Ben Bolstad <bolstad@stat.berkeley.edu> wrote: > I am surprised that you can not get 100 CEL files to > run successfully. 4 > GB (on a single machine) should be more than > sufficient to do this. > Because the MAS5 method is not a multi-chip method, > you could compute > the expression measures in smaller groups (say > groups of 25 for > example). Join the expression measures together for > each group and then > scale them appropriately. > > If you are using the mas5() function from the affy > package remember to > set normalize=FALSE when doing each of the groups > (so as not to scale > the groups individually). > > Ben > > > On Wed, 2003-07-02 at 09:52, Tapan Mehta wrote: > > Hello, > > > > I am trying to run a task of 100 CEL files each of > 10 > > MB on Linux based Bewoulf cluster with each node > of 4 > > GB. I am trying to use the mas5 method in the affy > > library. However I am unable to finish it gives > memory > > problems. I am unable to increase the memory as > the > > function of memory in R is only for Windows( as > posted > > on the R mailing list). Please could anybody guide > me > > in this problem? > > > > Regards, > > > > Tapan Mehta > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -- > Ben Bolstad <bolstad@stat.berkeley.edu> > http://www.stat.berkeley.edu/~bolstad >
ADD REPLY
0
Entering edit mode
i dont know about taking advantage of the cluster, but you can easily create a function that reads cel files in one by one (not keeping them in memory) an computes mas5 values for each one... then normalizes at the end. for RMA try justRMA. looking at the code for justRMA can help you write the function you want for mas5. i believe that with justRMA you can handle 100 hgu133 cels with less than 1 gig. On Fri, 4 Jul 2003, Tapan Mehta wrote: > I tried using rma instead of mas5 to check whether the > memory restrictions still apply. Unfortunately they do > apply and right now I am finding it impossible to > process all the 100 files together. However mas5 works > for smaller batch of 20 or 25 chips. However it would > be great to figure out a solution that can process 100 > files since for rma I need to process all the 100 > files in a single batch. Are these > algorithms(mas5,rma) parallelizable or can the > programs be modified for making the optimum use of the > cluster? > > --- Ben Bolstad <bolstad@stat.berkeley.edu> wrote: > > I am surprised that you can not get 100 CEL files to > > run successfully. 4 > > GB (on a single machine) should be more than > > sufficient to do this. > > Because the MAS5 method is not a multi-chip method, > > you could compute > > the expression measures in smaller groups (say > > groups of 25 for > > example). Join the expression measures together for > > each group and then > > scale them appropriately. > > > > If you are using the mas5() function from the affy > > package remember to > > set normalize=FALSE when doing each of the groups > > (so as not to scale > > the groups individually). > > > > Ben > > > > > > On Wed, 2003-07-02 at 09:52, Tapan Mehta wrote: > > > Hello, > > > > > > I am trying to run a task of 100 CEL files each of > > 10 > > > MB on Linux based Bewoulf cluster with each node > > of 4 > > > GB. I am trying to use the mas5 method in the affy > > > library. However I am unable to finish it gives > > memory > > > problems. I am unable to increase the memory as > > the > > > function of memory in R is only for Windows( as > > posted > > > on the R mailing list). Please could anybody guide > > me > > > in this problem? > > > > > > Regards, > > > > > > Tapan Mehta > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > -- > > Ben Bolstad <bolstad@stat.berkeley.edu> > > http://www.stat.berkeley.edu/~bolstad > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
Thanks a lot for the help however I am getting the same problem even after using the justRMA method and the ulimit for the shell. I am working on hgu95 CEL files each of which is of nearly 10 MB. I will look into how I can modify the mas5 for the cluster and give you an update. Thanks again, Tapan Mehta --- "Rafael A. Irizarry" <ririzarr@jhsph.edu> wrote: > i dont know about taking advantage of the cluster, > but > you can easily create a function that reads cel > files in one by one (not > keeping them in memory) an computes mas5 values for > each one... then > normalizes at the end. for RMA try justRMA. looking > at the code for > justRMA can help you write the function you want for > mas5. > > i believe that with justRMA you can handle 100 > hgu133 cels with less than > 1 gig. > > > On Fri, 4 Jul 2003, Tapan Mehta wrote: > > > I tried using rma instead of mas5 to check whether > the > > memory restrictions still apply. Unfortunately > they do > > apply and right now I am finding it impossible to > > process all the 100 files together. However mas5 > works > > for smaller batch of 20 or 25 chips. However it > would > > be great to figure out a solution that can process > 100 > > files since for rma I need to process all the 100 > > files in a single batch. Are these > > algorithms(mas5,rma) parallelizable or can the > > programs be modified for making the optimum use of > the > > cluster? > > > > --- Ben Bolstad <bolstad@stat.berkeley.edu> wrote: > > > I am surprised that you can not get 100 CEL > files to > > > run successfully. 4 > > > GB (on a single machine) should be more than > > > sufficient to do this. > > > Because the MAS5 method is not a multi-chip > method, > > > you could compute > > > the expression measures in smaller groups (say > > > groups of 25 for > > > example). Join the expression measures together > for > > > each group and then > > > scale them appropriately. > > > > > > If you are using the mas5() function from the > affy > > > package remember to > > > set normalize=FALSE when doing each of the > groups > > > (so as not to scale > > > the groups individually). > > > > > > Ben > > > > > > > > > On Wed, 2003-07-02 at 09:52, Tapan Mehta wrote: > > > > Hello, > > > > > > > > I am trying to run a task of 100 CEL files > each of > > > 10 > > > > MB on Linux based Bewoulf cluster with each > node > > > of 4 > > > > GB. I am trying to use the mas5 method in the > affy > > > > library. However I am unable to finish it > gives > > > memory > > > > problems. I am unable to increase the memory > as > > > the > > > > function of memory in R is only for Windows( > as > > > posted > > > > on the R mailing list). Please could anybody > guide > > > me > > > > in this problem? > > > > > > > > Regards, > > > > > > > > Tapan Mehta > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@stat.math.ethz.ch > > > > > > > > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > -- > > > Ben Bolstad <bolstad@stat.berkeley.edu> > > > http://www.stat.berkeley.edu/~bolstad > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD REPLY

Login before adding your answer.

Traffic: 875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6