Memory problem with rma()

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi, I am running rma() to correct, normalize and summarize a batch of ca. 5500 arrays. I have currently a memory limit of 8gb and the procedures exceeds that. I am guessing that it breaks at the background correction step. I investigated the temporary directory and it's only file called tmp_310151_rbg.root that was modified (size of that file is 16gb). I attached the code below. I tried the latest ROOT version and the one recommended at bioconductor (root_v5.34.14,root_v5.34.05). Any idea why is there the memory issue? scheme.HuEx <- import.exon.scheme( filename = "Scheme_HuEx-1_0v2r2_hg19", layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", probeset = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.probeset.csv", transcript = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") data.HuEx <- import.data( scheme.HuEx, filename = "fhsCEL", filedir = "normalizationXPS/", celdir = "expression_CEL_raw/" ) data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", filedir="normalizationXPS", tmpdir = "normalizationXPS/tmpDir", add.data=FALSE, background="antigenomic", normalize=TRUE, option="transcript", exonlevel="core") -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=C LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.22.2 loaded via a namespace (and not attached): [1] tools_3.0.2 -- Sent via the guest posting facility at bioconductor.org.

• 1.3k views

ADD COMMENT • link updated 10.2 years ago by Stephen Piccolo ▴ 590 • written 10.2 years ago by Guest User ★ 13k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

And what was the actual error that you got? Sean On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < guest@bioconductor.org> wrote: > > Hi, > > I am running rma() to correct, normalize and summarize a batch of ca. 5500 > arrays. I have currently a memory limit of 8gb and the procedures exceeds > that. I am guessing that it breaks at the background correction step. I > investigated the temporary directory and it's only file called > tmp_310151_rbg.root that was modified (size of that file is 16gb). I > attached the code below. > > I tried the latest ROOT version and the one recommended at bioconductor > (root_v5.34.14,root_v5.34.05). > > Any idea why is there the memory issue? > > scheme.HuEx <- import.exon.scheme( > filename = "Scheme_HuEx-1_0v2r2_hg19", > layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", > schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", > probeset = > "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", > transcript = > "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") > > scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > > data.HuEx <- import.data( > scheme.HuEx, > filename = "fhsCEL", > filedir = "normalizationXPS/", > celdir = "expression_CEL_raw/" > ) > > data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > > rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > filedir="normalizationXPS", > tmpdir = "normalizationXPS/tmpDir", > add.data=FALSE, background="antigenomic", normalize=TRUE, > option="transcript", exonlevel="core") > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=C LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.22.2 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.2 years ago Sean Davis 21k

0

Entering edit mode

I don't get a proper error message because I'm running the R session in an interactive shell on a cluster (queuing system). When the memory limit of 8gb is reached, my interactive shell is terminated by the queuing system. > And what was the actual error that you got? > > Sean > > > > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < > guest at bioconductor.org> wrote: > >> >> Hi, >> >> I am running rma() to correct, normalize and summarize a batch of ca. >> 5500 >> arrays. I have currently a memory limit of 8gb and the procedures >> exceeds >> that. I am guessing that it breaks at the background correction step. I >> investigated the temporary directory and it's only file called >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I >> attached the code below. >> >> I tried the latest ROOT version and the one recommended at bioconductor >> (root_v5.34.14,root_v5.34.05). >> >> Any idea why is there the memory issue? >> >> scheme.HuEx <- import.exon.scheme( >> filename = "Scheme_HuEx-1_0v2r2_hg19", >> layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", >> schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", >> probeset = >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", >> transcript = >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") >> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") >> >> data.HuEx <- import.data( >> scheme.HuEx, >> filename = "fhsCEL", >> filedir = "normalizationXPS/", >> celdir = "expression_CEL_raw/" >> ) >> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") >> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", >> filedir="normalizationXPS", >> tmpdir = "normalizationXPS/tmpDir", >> add.data=FALSE, background="antigenomic", >> normalize=TRUE, >> option="transcript", exonlevel="core") >> >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=C LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] xps_1.22.2 >> >> loaded via a namespace (and not attached): >> [1] tools_3.0.2 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 10.2 years ago plichta@cbs.dtu.dk ▴ 20

0

Entering edit mode

Hi Damian, Soon, Christian should reply to you. In the meantime, for my personal interest and to define plans for the oligo package, would you be willing to try processing your set with oligo? library(ff) library(oligo) cels = list.celfiles() raw = read.celfiles(cels) res = rma(raw) If you have multiple cores available, before loading oligo, load a parallel front-end: library(doMC) registerDoMC(4) Let me know how it goes, if you have some time to spare... Thanks a million, benilton On Feb 16, 2014 7:15 PM, <plichta@cbs.dtu.dk> wrote: > I don't get a proper error message because I'm running the R session in an > interactive shell on a cluster (queuing system). When the memory limit of > 8gb is reached, my interactive shell is terminated by the queuing system. > > > And what was the actual error that you got? > > > > Sean > > > > > > > > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < > > guest@bioconductor.org> wrote: > > > >> > >> Hi, > >> > >> I am running rma() to correct, normalize and summarize a batch of ca. > >> 5500 > >> arrays. I have currently a memory limit of 8gb and the procedures > >> exceeds > >> that. I am guessing that it breaks at the background correction step. I > >> investigated the temporary directory and it's only file called > >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I > >> attached the code below. > >> > >> I tried the latest ROOT version and the one recommended at bioconductor > >> (root_v5.34.14,root_v5.34.05). > >> > >> Any idea why is there the memory issue? > >> > >> scheme.HuEx <- import.exon.scheme( > >> filename = "Scheme_HuEx-1_0v2r2_hg19", > >> layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", > >> schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", > >> probeset = > >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", > >> transcript = > >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") > >> > >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > >> > >> data.HuEx <- import.data( > >> scheme.HuEx, > >> filename = "fhsCEL", > >> filedir = "normalizationXPS/", > >> celdir = "expression_CEL_raw/" > >> ) > >> > >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > >> > >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > >> filedir="normalizationXPS", > >> tmpdir = "normalizationXPS/tmpDir", > >> add.data=FALSE, background="antigenomic", > >> normalize=TRUE, > >> option="transcript", exonlevel="core") > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=C LC_NUMERIC=C > >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] xps_1.22.2 > >> > >> loaded via a namespace (and not attached): > >> [1] tools_3.0.2 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.2 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

Hi Benilton, I tried oligo and it choked: >? >raw <- read.celfiles(cels) Loading required package: pd.huex.1.0.st.v2 Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 1 and .Machine$integer.max") : missing value where TRUE/FALSE needed In addition: Warning message: In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = file.path(ldPath(), : NAs introduced by coercion Do you know what does this error indicate? Thanks, Damian > Hi Damian, > > Soon, Christian should reply to you. > > In the meantime, for my personal interest and to define plans for the > oligo > package, would you be willing to try processing your set with oligo? > > library(ff) > library(oligo) > cels = list.celfiles() > raw = read.celfiles(cels) > res = rma(raw) > > If you have multiple cores available, before loading oligo, load a > parallel > front-end: > > library(doMC) > registerDoMC(4) > > Let me know how it goes, if you have some time to spare... > > Thanks a million, benilton > On Feb 16, 2014 7:15 PM, <plichta at="" cbs.dtu.dk=""> wrote: > >> I don't get a proper error message because I'm running the R session in >> an >> interactive shell on a cluster (queuing system). When the memory limit >> of >> 8gb is reached, my interactive shell is terminated by the queuing >> system. >> >> > And what was the actual error that you got? >> > >> > Sean >> > >> > >> > >> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < >> > guest at bioconductor.org> wrote: >> > >> >> >> >> Hi, >> >> >> >> I am running rma() to correct, normalize and summarize a batch of ca. >> >> 5500 >> >> arrays. I have currently a memory limit of 8gb and the procedures >> >> exceeds >> >> that. I am guessing that it breaks at the background correction step. >> I >> >> investigated the temporary directory and it's only file called >> >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I >> >> attached the code below. >> >> >> >> I tried the latest ROOT version and the one recommended at >> bioconductor >> >> (root_v5.34.14,root_v5.34.05). >> >> >> >> Any idea why is there the memory issue? >> >> >> >> scheme.HuEx <- import.exon.scheme( >> >> filename = "Scheme_HuEx-1_0v2r2_hg19", >> >> layoutfile = >> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf", >> >> schemefile = >> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf", >> >> probeset = >> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", >> >> transcript = >> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") >> >> >> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") >> >> >> >> data.HuEx <- import.data( >> >> scheme.HuEx, >> >> filename = "fhsCEL", >> >> filedir = "normalizationXPS/", >> >> celdir = "expression_CEL_raw/" >> >> ) >> >> >> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") >> >> >> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", >> >> filedir="normalizationXPS", >> >> tmpdir = "normalizationXPS/tmpDir", >> >> add.data=FALSE, background="antigenomic", >> >> normalize=TRUE, >> >> option="transcript", exonlevel="core") >> >> >> >> >> >> -- output of sessionInfo(): >> >> >> >> R version 3.0.2 (2013-09-25) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] LC_CTYPE=C LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> >> >> other attached packages: >> >> [1] xps_1.22.2 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] tools_3.0.2 >> >> >> >> -- >> >> Sent via the guest posting facility at bioconductor.org. >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 10.2 years ago plichta@cbs.dtu.dk ▴ 20

0

Entering edit mode

Thanks, Damian, that's the indication that 'ff' hit the maximum limit in object dimensions... :-( Thanks for letting me know, b 2014-02-17 0:22 GMT-03:00 <plichta@cbs.dtu.dk>: > Hi Benilton, > > I tried oligo and it choked: > > >... > >raw <- read.celfiles(cels) > > Loading required package: pd.huex.1.0.st.v2 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Error in if (length < 0 || length > .Machine$integer.max) stop("length > must be between 1 and .Machine$integer.max") : > missing value where TRUE/FALSE needed > In addition: Warning message: > In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = > file.path(ldPath(), : > NAs introduced by coercion > > Do you know what does this error indicate? > > Thanks, > > Damian > > > Hi Damian, > > > > Soon, Christian should reply to you. > > > > In the meantime, for my personal interest and to define plans for the > > oligo > > package, would you be willing to try processing your set with oligo? > > > > library(ff) > > library(oligo) > > cels = list.celfiles() > > raw = read.celfiles(cels) > > res = rma(raw) > > > > If you have multiple cores available, before loading oligo, load a > > parallel > > front-end: > > > > library(doMC) > > registerDoMC(4) > > > > Let me know how it goes, if you have some time to spare... > > > > Thanks a million, benilton > > On Feb 16, 2014 7:15 PM, <plichta@cbs.dtu.dk> wrote: > > > >> I don't get a proper error message because I'm running the R session in > >> an > >> interactive shell on a cluster (queuing system). When the memory limit > >> of > >> 8gb is reached, my interactive shell is terminated by the queuing > >> system. > >> > >> > And what was the actual error that you got? > >> > > >> > Sean > >> > > >> > > >> > > >> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < > >> > guest@bioconductor.org> wrote: > >> > > >> >> > >> >> Hi, > >> >> > >> >> I am running rma() to correct, normalize and summarize a batch of ca. > >> >> 5500 > >> >> arrays. I have currently a memory limit of 8gb and the procedures > >> >> exceeds > >> >> that. I am guessing that it breaks at the background correction step. > >> I > >> >> investigated the temporary directory and it's only file called > >> >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I > >> >> attached the code below. > >> >> > >> >> I tried the latest ROOT version and the one recommended at > >> bioconductor > >> >> (root_v5.34.14,root_v5.34.05). > >> >> > >> >> Any idea why is there the memory issue? > >> >> > >> >> scheme.HuEx <- import.exon.scheme( > >> >> filename = "Scheme_HuEx-1_0v2r2_hg19", > >> >> layoutfile = > >> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf", > >> >> schemefile = > >> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf", > >> >> probeset = > >> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", > >> >> transcript = > >> >> "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") > >> >> > >> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > >> >> > >> >> data.HuEx <- import.data( > >> >> scheme.HuEx, > >> >> filename = "fhsCEL", > >> >> filedir = "normalizationXPS/", > >> >> celdir = "expression_CEL_raw/" > >> >> ) > >> >> > >> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > >> >> > >> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > >> >> filedir="normalizationXPS", > >> >> tmpdir = "normalizationXPS/tmpDir", > >> >> add.data=FALSE, background="antigenomic", > >> >> normalize=TRUE, > >> >> option="transcript", exonlevel="core") > >> >> > >> >> > >> >> -- output of sessionInfo(): > >> >> > >> >> R version 3.0.2 (2013-09-25) > >> >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> >> > >> >> locale: > >> >> [1] LC_CTYPE=C LC_NUMERIC=C > >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> >> > >> >> attached base packages: > >> >> [1] stats graphics grDevices utils datasets methods base > >> >> > >> >> other attached packages: > >> >> [1] xps_1.22.2 > >> >> > >> >> loaded via a namespace (and not attached): > >> >> [1] tools_3.0.2 > >> >> > >> >> -- > >> >> Sent via the guest posting facility at bioconductor.org. > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@r-project.org > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > >> > > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 10.2 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 5.5 years ago

Austria

Dear Damian, In principle you should not have a memory problem, however 5500 exon arrays is quite a lot, thus let me propose the following: 1. Do not run function rma() directly, but do it stepwise, i.e.: data.bg.rma <- bgcorrect.rma(data.exon, ...) data.qu.rma <- normalize.quantiles(data.bg.rma, ...) data.mp.rma <- summarize.rma(data.qu.rma, ...) You can find an example in script examples/script4exon.R (at line 750). In this way you will not loose all your computation if anything goes wrong at one step. Maybe you will also need to to set 'add.data=FALSE' in summarize.rma() otherwise all expression data will be imported causing a memory problem, too. Another way to run rma() stepwise is to use function express(), see example in script examples/script4exon.R (at line 785). When using function express you could set parameter 'bufsize=4000', which will reduce the basket size for each tree, thus consuming less RAM. 2. I would suggest to use first only 6 exon arrays to see if everything works fine, then I would try to run 50 exon arrays to see if - there is an initial memory problem - to estimate how long each step needs if you run all 5500 arrays (approximately time x 110) 3. Please run everything with 'verbose=TRUE' so that you can see the output interactively. Maybe you could pipe the output to a text file. 4. Since you assume that there may be a memory problem: maybe you can run top (or something else) and check RSIZE/VSIZE from time to time. Maybe you can create a script which export the memory consumption e.g. every 10 min. 4. I am not sure if running the code on a cluster is a good idea. Do you run your code on a node which is exclusively used for this purpose only? My suggestion would be to run your code on a machine where nothing else is running, since I assume that for 5500 exon arrays you will need at least one week (but see point 2). (Note: In 2009 a customer was running 23000 HGU-133_Plus2 arrays on a machine and with his help I could eliminate (hopefully) all memory problems, some of which appeared after 2000 arrays only. In his case memory consumption initially increased to 7.8 GB but after solving the memory problems memory consumption remained at 3.0 GB.) Best regards, Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 2/16/14 8:07 PM, Damian Plichta [guest] wrote: > Hi, > > I am running rma() to correct, normalize and summarize a batch of ca. 5500 arrays. I have currently a memory limit of 8gb and the procedures exceeds that. I am guessing that it breaks at the background correction step. I investigated the temporary directory and it's only file called tmp_310151_rbg.root that was modified (size of that file is 16gb). I attached the code below. > > I tried the latest ROOT version and the one recommended at bioconductor (root_v5.34.14,root_v5.34.05). > > Any idea why is there the memory issue? > > scheme.HuEx <- import.exon.scheme( > filename = "Scheme_HuEx-1_0v2r2_hg19", > layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", > schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", > probeset = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.probeset.csv", > transcript = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") > > scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > > data.HuEx <- import.data( > scheme.HuEx, > filename = "fhsCEL", > filedir = "normalizationXPS/", > celdir = "expression_CEL_raw/" > ) > > data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > > rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > filedir="normalizationXPS", > tmpdir = "normalizationXPS/tmpDir", > add.data=FALSE, background="antigenomic", normalize=TRUE, > option="transcript", exonlevel="core") > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=C LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.22.2 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. >

ADD COMMENT • link 10.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Stephen Piccolo ▴ 590

@stephen-piccolo-6761

Last seen 3.6 years ago

United States

Hi Damian, I receive the digest version of the BioC mailing list, so I apologize if someone already gave this reply, but various Bioconductor packages are designed for processing very large Affy data sets. Our own SCAN.UPC package as well as the fRMA package normalize one sample at a time and thus can be applied to data sets of any size. Another option would be the aroma.affymetrix package, which is designed for doing memory-efficient RMA normalization. Hope that helps! If you end up trying SCAN.UPC, you might also try the option for processing multiple samples in parallel, which you should be able to do on a computer cluster. Regards, -Steve On 2/17/14, 4:00 AM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: >Date: Mon, 17 Feb 2014 00:39:48 -0300 >From: Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >To: plichta at cbs.dtu.dk >Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, Sean > Davis <sdavis2 at="" mail.nih.gov=""> >Subject: Re: [BioC] Memory problem with rma() >Message-ID: > <cao-arwmyx1ynxv8osnqa96=2pehxvvfdmojam56brj-wez-c_a at="" mail.gmail.com=""> >Content-Type: text/plain > >Thanks, Damian, > >that's the indication that 'ff' hit the maximum limit in object >dimensions... :-( > >Thanks for letting me know, > >b > > >2014-02-17 0:22 GMT-03:00 <plichta at="" cbs.dtu.dk="">: > >>Hi Benilton, >> >>I tried oligo and it choked: >> >>>... >>>raw <- read.celfiles(cels) >> >>Loading required package: pd.huex.1.0.st.v2 >>Loading required package: RSQLite >>Loading required package: DBI >>Platform design info loaded. >>Error in if (length < 0 || length > .Machine$integer.max) stop("length >>must be between 1 and .Machine$integer.max") : >> missing value where TRUE/FALSE needed >>In addition: Warning message: >>In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = >>file.path(ldPath(), : >> NAs introduced by coercion >> >>Do you know what does this error indicate? >> >>Thanks, >> >>Damian >> >>> Hi Damian, >>> >>> Soon, Christian should reply to you. >>> >>> In the meantime, for my personal interest and to define plans for the >>> oligo >>> package, would you be willing to try processing your set with oligo? >>> >>> library(ff) >>> library(oligo) >>> cels = list.celfiles() >>> raw = read.celfiles(cels) >>> res = rma(raw) >>> >>> If you have multiple cores available, before loading oligo, load a >>> parallel >>> front-end: >>> >>> library(doMC) >>> registerDoMC(4) >>> >>> Let me know how it goes, if you have some time to spare... >>> >>> Thanks a million, benilton >>> On Feb 16, 2014 7:15 PM, <plichta at="" cbs.dtu.dk=""> wrote: >>> >>>> I don't get a proper error message because I'm running the R session >>>>in >>>> an >>>> interactive shell on a cluster (queuing system). When the memory limit >>>> of >>>> 8gb is reached, my interactive shell is terminated by the queuing >>>> system. >>>> >>>> > And what was the actual error that you got? >>>> > >>>> > Sean >>>> > >>>> > >>>> > >>>> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < >>>> > guest at bioconductor.org> wrote: >>>> > >>>> >> >>>> >> Hi, >>>> >> >>>> >> I am running rma() to correct, normalize and summarize a batch of >>>>ca. >>>> >> 5500 >>>> >> arrays. I have currently a memory limit of 8gb and the procedures >>>> >> exceeds >>>> >> that. I am guessing that it breaks at the background correction >>>>step. >>>> I >>>> >> investigated the temporary directory and it's only file called >>>> >> tmp_310151_rbg.root that was modified (size of that file is 16gb). >>>> I >>>> >> attached the code below. >>>> >> >>>> >> I tried the latest ROOT version and the one recommended at >>>> bioconductor >>>> >> (root_v5.34.14,root_v5.34.05). >>>> >> >>>> >> Any idea why is there the memory issue? >>>> >> >>>> >> scheme.HuEx <- import.exon.scheme( >>>> >> filename = "Scheme_HuEx-1_0v2r2_hg19", >>>> >> layoutfile = >>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf", >>>> >> schemefile = >>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf", >>>> >> probeset = >>>> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", >>>> >> transcript = >>>> >> "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") >>>> >> >>>> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") >>>> >> >>>> >> data.HuEx <- import.data( >>>> >> scheme.HuEx, >>>> >> filename = "fhsCEL", >>>> >> filedir = "normalizationXPS/", >>>> >> celdir = "expression_CEL_raw/" >>>> >> ) >>>> >> >>>> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") >>>> >> >>>> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", >>>> >> filedir="normalizationXPS", >>>> >> tmpdir = "normalizationXPS/tmpDir", >>>> >> add.data=FALSE, background="antigenomic", >>>> >> normalize=TRUE, >>>> >> option="transcript", exonlevel="core") >>>> >> >>>> >> >>>> >> -- output of sessionInfo(): >>>> >> >>>> >> R version 3.0.2 (2013-09-25) >>>> >> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >> >>>> >> locale: >>>> >> [1] LC_CTYPE=C LC_NUMERIC=C >>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >> >>>> >> attached base packages: >>>> >> [1] stats graphics grDevices utils datasets methods >>>>base >>>> >> >>>> >> other attached packages: >>>> >> [1] xps_1.22.2 >>>> >> >>>> >> loaded via a namespace (and not attached): >>>> >> [1] tools_3.0.2 >>>> >> >>>> >> -- >>>> >> Sent via the guest posting facility at bioconductor.org. >>>> >> >>>> >> _______________________________________________ >>>> >> Bioconductor mailing list >>>> >> Bioconductor at r-project.org >>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >> Search the archives: >>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >> >>>> > >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> >> >> >

ADD COMMENT • link 10.2 years ago Stephen Piccolo ▴ 590

Login before adding your answer.