memory usage ReadAffy() and probe level information

0

Entering edit mode

Roel Verhaak ▴ 70

@roel-verhaak-710

Last seen 11.3 years ago

Hi, I have two questions regarding the use of the affy package: - I have a large series of cel-files and am trying to read them at once. Unfortunately, the ReadAffy-function seems to use a lot of memory. My workstation has 2 Gig of RAM installed, but trying to read >100 cel-files (HGU133a) is a no-go. This is R1.8.1 with Bioconductor 1.3 on a Windows machine. Reading in 50 cel-files already means a (peak-)memory usage of 800 meg. Is there any solution to this, because I would like to read 300 cel-files at the same time if possible. I have played around with the memory.limit()-function, but with no success. - Second, after data import I would like to retrieve all information on a probe level for several probe sets. I do this using the pm()-function, for instance >pm(CelFiles)[1:16, 1:50]. The problem is that I have to find out first where which exact "location" the probe set of interest has. This is not a big problem, but I thought there might be a more elegant solution to this. Thanks in advance, Roel Verhaak -------------- next part -------------- A non-text attachment was scrubbed... Name: r.verhaak.vcf Type: text/x-vcard Size: 308 bytes Desc: Card for Roel Verhaak Url : https://www.stat.math.ethz.ch/pipermail/bioconductor/attachments /20040406/7d7e5b20/r.verhaak.vcf

probe affy probe affy • 1.4k views

ADD COMMENT • link updated 21.7 years ago by Tan, MinHan ▴ 180 • written 21.7 years ago by Roel Verhaak ▴ 70

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 52 minutes ago

United States

Simply put, you are not going to be able to read in >300 cel files with only(!) 2 Gb RAM. You should be able to do justRMA with this much RAM on 100 or so chips, but if you really want to be able to do huge numbers of chips you are going to have to upgrade to a 64 bit architecture, which at this point in time also means you have to switch to Linux. To get the pm probes I think you want to do something like this: my.pms <- probes(abatch, "pm", "1007_s_at") # for e.g., the 1007_s_at probes Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> Roel Verhaak <r.verhaak@erasmusmc.nl> 04/06/04 08:32AM >>> Hi, I have two questions regarding the use of the affy package: - I have a large series of cel-files and am trying to read them at once. Unfortunately, the ReadAffy-function seems to use a lot of memory. My workstation has 2 Gig of RAM installed, but trying to read >100 cel-files (HGU133a) is a no-go. This is R1.8.1 with Bioconductor 1.3 on a Windows machine. Reading in 50 cel-files already means a (peak-)memory usage of 800 meg. Is there any solution to this, because I would like to read 300 cel-files at the same time if possible. I have played around with the memory.limit()-function, but with no success. - Second, after data import I would like to retrieve all information on a probe level for several probe sets. I do this using the pm()-function, for instance >pm(CelFiles)[1:16, 1:50]. The problem is that I have to find out first where which exact "location" the probe set of interest has. This is not a big problem, but I thought there might be a more elegant solution to this. Thanks in advance, Roel Verhaak

ADD COMMENT • link 21.7 years ago James W. MacDonald 68k

0

Entering edit mode

Arne.Muller@aventis.com ▴ 620

@arnemulleraventiscom-466

Last seen 11.3 years ago

Hi, I'd interested in a solution myself ;-) . Reading in many cel files is actually not the bottle neck! When it comes to use "expresso" with normalisation across the cel files the memory usage increases again, and you may eventually exceed all your 2GB of memory. I'm normalizing 42 MG-U74Av2 chips and expresso takes about 800mb for rma + quntiles + pmonly + medianpolish. It was mentioned previously in this list (don't remember when exactly) the justRMA method performs exactly the above normalization procedure, but is a lot fasdte rand used a lot less memory. You feed the cel file names to the routine not an AffyBath object (see help(justRMA)). Off course if you'd like to use other normalization of background correction methods you're pretty much bound to use expresso, for which will probably need too much memory for your 100 cel files ... Sorry, I've not suggestion for your 2nd problem (probe set location). regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of > Roel Verhaak > Sent: 06 April 2004 14:32 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] memory usage ReadAffy() and probe level information > > > Hi, > > I have two questions regarding the use of the affy package: > - I have a large series of cel-files and am trying to read > them at once. Unfortunately, the ReadAffy-function seems to > use a lot of memory. My workstation has 2 Gig of RAM > installed, but trying to read >100 cel-files (HGU133a) is a > no-go. This is R1.8.1 with Bioconductor 1.3 on a Windows > machine. Reading in 50 cel-files already means a > (peak-)memory usage of 800 meg. Is there any solution to > this, because I would like to read 300 cel-files at the same > time if possible. I have played around with the > memory.limit()-function, but with no success. > - Second, after data import I would like to retrieve all > information on a probe level for several probe sets. I do > this using the pm()-function, for instance > >pm(CelFiles)[1:16, 1:50]. The problem is that I have to find > out first where which exact "location" the probe set of > interest has. This is not a big problem, but I thought there > might be a more elegant solution to this. > > Thanks in advance, > Roel Verhaak >

ADD COMMENT • link 21.7 years ago Arne.Muller@aventis.com ▴ 620

0

Entering edit mode

Tan, MinHan ▴ 180

@tan-minhan-431

Last seen 11.3 years ago

The limit for justRMA that I have encountered is approximately 110-120 HGU133 Plus 2.0 CEL files, with 2 GB RAM, R 1.9.0 beta running on Windows. I do not encounter a memory error directly - the R GUI application just gracefully folds up and reminds me to send an error report to Bill Gates. Anyway, I'm not sure if it directly translates in terms of number of transcripts, but that limit should allow >200 HGU133A chips? I am told that the maximum addressable memory for a 32 bit processor is 4GB - would it make a difference if the actual physical memory is increased from 2 GB to 4 GB (even though my memory.limit has already been set to 4095)? Min-Han Tan -----Original Message----- From: James MacDonald [mailto:jmacdon@med.umich.edu] Sent: Tuesday, April 06, 2004 8:52 AM To: r.verhaak@erasmusmc.nl; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] memory usage ReadAffy() and probe level information Simply put, you are not going to be able to read in >300 cel files with only(!) 2 Gb RAM. You should be able to do justRMA with this much RAM on 100 or so chips, but if you really want to be able to do huge numbers of chips you are going to have to upgrade to a 64 bit architecture, which at this point in time also means you have to switch to Linux. To get the pm probes I think you want to do something like this: my.pms <- probes(abatch, "pm", "1007_s_at") # for e.g., the 1007_s_at probes Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> Roel Verhaak <r.verhaak@erasmusmc.nl> 04/06/04 08:32AM >>> Hi, I have two questions regarding the use of the affy package: - I have a large series of cel-files and am trying to read them at once. Unfortunately, the ReadAffy-function seems to use a lot of memory. My workstation has 2 Gig of RAM installed, but trying to read >100 cel-files (HGU133a) is a no-go. This is R1.8.1 with Bioconductor 1.3 on a Windows machine. Reading in 50 cel-files already means a (peak-)memory usage of 800 meg. Is there any solution to this, because I would like to read 300 cel-files at the same time if possible. I have played around with the memory.limit()-function, but with no success. - Second, after data import I would like to retrieve all information on a probe level for several probe sets. I do this using the pm()-function, for instance >pm(CelFiles)[1:16, 1:50]. The problem is that I have to find out first where which exact "location" the probe set of interest has. This is not a big problem, but I thought there might be a more elegant solution to this. Thanks in advance, Roel Verhaak _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor This email message, including any attachments, is for the so...{{dropped}}

ADD COMMENT • link 21.7 years ago Tan, MinHan ▴ 180

Login before adding your answer.