Question: large amount of slides
0
gravatar for Vada Wilcox
15.2 years ago by
Vada Wilcox10
Vada Wilcox10 wrote:
Dear all, I have been using RMA succesfully for a while now, but in the past I have only used it on a small amount of slides. I would like to do my study on a larger scale now, with data (series of experiments) from other researchers as well. My questions is the following: if I want to study, let's say 200 slides, do I have to read them all into R at once (so together I mean, with read.affy() in package affy), or is it OK to read them series by series (so all wild types and controls of one researcher at a time)? If it is really necessary to read all of them in at one time how much RAM would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now it's possible to raise it by using 'max vsize = ...' but I haven't been able to do it succesfully for 200 experiments though. Can somebody help me on this? Many thanks in advance, Vada _________________________________________________________________ http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
• 529 views
ADD COMMENTlink modified 15.2 years ago by Marcus Davy680 • written 15.2 years ago by Vada Wilcox10
Answer: large amount of slides
0
gravatar for Adaikalavan Ramasamy
15.2 years ago by
Adaikalavan Ramasamy1.8k wrote:
This is what I do : 1. Randomly split into manageable chuncks of say 4 batches of 50 (depends on computer) 2. Do RMA on these batches separately 3. Combine these 4 batches (e.g. cbind/merge) into one finalised dataset 4. Repeat for B times and take the average of B datasets >From past experience, the coefficient of variation is less than 0.03 for 99% of probesets if you use B = 20 - 30. If you like I can send my perl wrapper script that does this. This is based on the assumption you can submit multiple jobs (e.g. clusters or big server) but you can easily modify it. I don't know much about increasing RAM. You can try just.rma( ..., destructive=TRUE) but I am not sure if this uses significantly less RAM. Regards, Adai. On Fri, 2004-06-04 at 16:06, Vada Wilcox wrote: > Dear all, > > I have been using RMA succesfully for a while now, but in the past I have > only used it on a small amount of slides. I would like to do my study on a > larger scale now, with data (series of experiments) from other researchers > as well. My questions is the following: if I want to study, let's say 200 > slides, do I have to read them all into R at once (so together I mean, with > read.affy() in package affy), or is it OK to read them series by series (so > all wild types and controls of one researcher at a time)? > > If it is really necessary to read all of them in at one time how much RAM > would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now > it's possible to raise it by using 'max vsize = ...' but I haven't been able > to do it succesfully for 200 experiments though. Can somebody help me on > this? > > Many thanks in advance, > > Vada > > _________________________________________________________________ > > http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENTlink written 15.2 years ago by Adaikalavan Ramasamy1.8k
Answer: large amount of slides
0
gravatar for Park, Richard
15.2 years ago by
Park, Richard220
Park, Richard220 wrote:
Hi Vada, I would caution you on doing rma on that many datasets. I have noticed a trend in rma, that things get even more underestimated as the number and variance of the data increases. I have been doing an analysis on immune cell types for about 100 cel files. My computer (windows 2000, 2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty sure that my problem is that windows 2000 has a maximum allocation of 1gb. But if most of your data is pretty related (i.e. same tissues, just a ko vs wt) you should be fine w/ rma. I would caution against using rma on data that is very different. hth, richard -----Original Message----- From: Vada Wilcox [mailto:v_wilcox@hotmail.com] Sent: Friday, June 04, 2004 11:06 AM To: bioconductor@stat.math.ethz.ch Subject: [BioC] large amount of slides Dear all, I have been using RMA succesfully for a while now, but in the past I have only used it on a small amount of slides. I would like to do my study on a larger scale now, with data (series of experiments) from other researchers as well. My questions is the following: if I want to study, let's say 200 slides, do I have to read them all into R at once (so together I mean, with read.affy() in package affy), or is it OK to read them series by series (so all wild types and controls of one researcher at a time)? If it is really necessary to read all of them in at one time how much RAM would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now it's possible to raise it by using 'max vsize = ...' but I haven't been able to do it succesfully for 200 experiments though. Can somebody help me on this? Many thanks in advance, Vada _________________________________________________________________ http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENTlink written 15.2 years ago by Park, Richard220
Answer: large amount of slides
0
gravatar for Roel Verhaak
15.2 years ago by
Roel Verhaak70
Roel Verhaak70 wrote:
I have succesfully ran GCRMA on a dataset of 285 HGU133a chips, on a machine with 8 Gb RAM installed; I noticed a peak memory use of 5,5 Gb (although I have not been monitoring it continuously). I would say 200 chips use equally less memory, so around 4 Gb. Roel Verhaak > > Message: 9 > Date: Fri, 04 Jun 2004 10:06:14 -0500 > From: "Vada Wilcox" <v_wilcox@hotmail.com> > Subject: [BioC] large amount of slides > To: bioconductor@stat.math.ethz.ch > Message-ID: <bay19-f34sdgaixwb9d0002ec89@hotmail.com> > Content-Type: text/plain; format=flowed > > Dear all, > > I have been using RMA succesfully for a while now, but in the past I have > only used it on a small amount of slides. I would like to do my study on a > larger scale now, with data (series of experiments) from other researchers > as well. My questions is the following: if I want to study, let's say 200 > slides, do I have to read them all into R at once (so together I mean, > with > read.affy() in package affy), or is it OK to read them series by series > (so > all wild types and controls of one researcher at a time)? > If it is really necessary to read all of them in at one time how much RAM > would I need (for let's say 200 CELfiles) and how can I raise the RAM? I > now > it's possible to raise it by using 'max vsize = ...' but I haven't been > able > to do it succesfully for 200 experiments though. Can somebody help me on > this? >
ADD COMMENTlink written 15.2 years ago by Roel Verhaak70
Answer: large amount of slides
0
gravatar for Marcus Davy
15.2 years ago by
Marcus Davy680
Marcus Davy680 wrote:
Hi, you can use the function object.size to estimate the the storage of any expression set objects. e.g. > object.size(affybatch.example) [1] 243384 > dim(exprs(affybatch.example)) [1] 10000 3 > object.size(exprs(affybatch.example)) [1] 240280 > object.size(exprs(affybatch.example)) / (nrow(exprs(affybatch.example))*ncol(exprs(affybatch.example))) [1] 8.009333 Each matrix double precision value should take 8 bytes of storage, so you can estimate the amount of memory required for n genes by 200 arrays plus annotation information etc. On a *standard* windows XP (or 2000) machine running R 1.9.0 you can increase the addressable memory space with the --max-mem-size=2G arguement when you run the executable, details are in the windows FAQ. Check it has increased with; >memory.limit() [1] 2147483648 Memory intensive algorithms could start running out of addressable memory on some 32-bit architectures for large datasets, e.g. Bioconductors siggenes sam permutation testing function with B=1000, on 27000 genes is likely to have problems on some 32-bit platforms depending on physical memory and the virtual page size available to the operating system. marcus >>> "Park, Richard" <richard.park@joslin.harvard.edu> 5/06/2004 3:40:42 AM >>> Hi Vada, I would caution you on doing rma on that many datasets. I have noticed a trend in rma, that things get even more underestimated as the number and variance of the data increases. I have been doing an analysis on immune cell types for about 100 cel files. My computer (windows 2000, 2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty sure that my problem is that windows 2000 has a maximum allocation of 1gb. But if most of your data is pretty related (i.e. same tissues, just a ko vs wt) you should be fine w/ rma. I would caution against using rma on data that is very different. hth, richard -----Original Message----- From: Vada Wilcox [mailto:v_wilcox@hotmail.com] Sent: Friday, June 04, 2004 11:06 AM To: bioconductor@stat.math.ethz.ch Subject: [BioC] large amount of slides Dear all, I have been using RMA succesfully for a while now, but in the past I have only used it on a small amount of slides. I would like to do my study on a larger scale now, with data (series of experiments) from other researchers as well. My questions is the following: if I want to study, let's say 200 slides, do I have to read them all into R at once (so together I mean, with read.affy() in package affy), or is it OK to read them series by series (so all wild types and controls of one researcher at a time)? If it is really necessary to read all of them in at one time how much RAM would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now it's possible to raise it by using 'max vsize = ...' but I haven't been able to do it succesfully for 200 experiments though. Can somebody help me on this? Many thanks in advance, Vada _________________________________________________________________ http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}
ADD COMMENTlink written 15.2 years ago by Marcus Davy680
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 128 users visited in the last hour