bout big data set for Affy R packge

0

Entering edit mode

刘伟 ▴ 30

@-5667

Last seen 9.6 years ago

Dear Buddy, I am a user of affy R package. When I attempt to handle a large number (aprox. 300) of microarrays, I always get an error in memory allocation from R. I searched the web but didnot find any solution for readaffy() with large dataset. I donnot know if the problem can be fixed in some way. Any suggestion is appreciated. Thanks. Sincerely, Wei Liu [[alternative HTML version deleted]]

affy affy • 1.3k views

ADD COMMENT • link updated 11.3 years ago by Stephen Piccolo ▴ 590 • written 11.3 years ago by 刘伟 ▴ 30

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Hi Wei Liu, You can try justRMA(). If that doesn't work, you can try the aroma.affymetrix package. Note that the aroma.affymetrix package is not part of BioC, and has its own user group and repository, so you need to do a google search for that one. Best, Jim On 12/19/2012 9:21 AM, ?? wrote: > Dear Buddy, > I am a user of affy R package. When I attempt to handle a large > number (aprox. 300) of microarrays, I always get an error in memory > allocation from R. I searched the web but didnot find any solution for > readaffy() with large dataset. I donnot know if the problem can be > fixed in some way. Any suggestion is appreciated. Thanks. > > Sincerely, > Wei Liu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.3 years ago James W. MacDonald 65k

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 5.5 years ago

Austria

Dear Wei Liu, You could use the BioConductor package xps which can handle a couple of thousand microarrays on computers with 1-2 GB RAM only. See also: http://www.bioconductor.org/help/workflows/oligo-arrays/#pre- processing-resources which packages might be relevant. Regards Christian On 12/19/12 3:21 PM, ?? wrote: > Dear Buddy, > I am a user of affy R package. When I attempt to handle a large > number (aprox. 300) of microarrays, I always get an error in memory > allocation from R. I searched the web but didnot find any solution for > readaffy() with large dataset. I donnot know if the problem can be > fixed in some way. Any suggestion is appreciated. Thanks. > > Sincerely, > Wei Liu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 11.3 years ago cstrato ★ 3.9k

0

Entering edit mode

Rob Dunne ▴ 230

@rob-dunne-292

Last seen 9.6 years ago

Hi Wei Liu, if they are affymetrix 1.0 ST exon arrays, I can send you a modified version of read.celfiles from the oligo package that should read a 300 microarray data set. I dont know it it will work for other array types, possibly not without some work. It is a modified version of the read.celfiles that uses the big.matrix class from the big.memory package my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) my. data #assayData: 6553600 features, 335 samples #Annotation: pd.huex.1.0.st.v2 Bye Rob On 12/20/2012 01:21 AM, ?? wrote: > Dear Buddy, > I am a user of affy R package. When I attempt to handle a large > number (aprox. 300) of microarrays, I always get an error in memory > allocation from R. I searched the web but didnot find any solution for > readaffy() with large dataset. I donnot know if the problem can be > fixed in some way. Any suggestion is appreciated. Thanks. > > Sincerely, > Wei Liu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- - Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 Locked Bag 17, North Ryde, New South Wales, Australia, 1670 http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au Java has certainly revolutionized marketing and litigation.

ADD COMMENT • link 11.3 years ago Rob Dunne ▴ 230

0

Entering edit mode

Hi Rob, looks like you're running an old version of oligo. Today, our approach is: library(ff) library(oligo) my.data <- read.celfiles(<cel file="" names="">) HTH, b On 21 December 2012 01:02, Rob Dunne <rob.dunne at="" csiro.au=""> wrote: > Hi Wei Liu, > > if they are affymetrix 1.0 ST exon arrays, I can send you a modified version of read.celfiles from the oligo package that > should read a 300 microarray data set. I dont know it it will work for other array types, possibly not without some work. > It is a modified version of the read.celfiles that uses the big.matrix class from the big.memory package > > my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) > my. data > #assayData: 6553600 features, 335 samples > #Annotation: pd.huex.1.0.st.v2 > > Bye > Rob > > > > > On 12/20/2012 01:21 AM, ?? wrote: >> Dear Buddy, >> I am a user of affy R package. When I attempt to handle a large >> number (aprox. 300) of microarrays, I always get an error in memory >> allocation from R. I searched the web but didnot find any solution for >> readaffy() with large dataset. I donnot know if the problem can be >> fixed in some way. Any suggestion is appreciated. Thanks. >> >> Sincerely, >> Wei Liu >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > - > Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 > CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 > Locked Bag 17, North Ryde, New South Wales, Australia, 1670 > http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au > > Java has certainly revolutionized marketing and litigation. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.3 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

Hi Benilton, Unless I am missing something, ff wont help in this case. From the ff help page "Currently ff objects cannot have length zero and are limited to ?.Machine$integer.max? elements" and .Machine$integer.max is 2^(31)-1. This is exceeded when you try to load 328 Affy exon arrays hence library(ff) library(oligo) data<-read.celfiles(filenames=files) #Loading required package: pd.huex.1.0.st.v2 #Loading required package: RSQLite #Loading required package: DBI #Platform design info loaded. #Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 1 and .Machine$integer.max") : # missing value where TRUE/FALSE needed #In addition: Warning message: #In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = file.path(ldPath(), : # NAs introduced by coercion traceback() #4: ff(initdata = initdata, vmode = vmode, dim = dim, pattern = file.path(ldPath(), # basename(name))) #3: createFF("intensities-", dim = c(nr, length(filenames))) #2: smartReadCEL(filenames, sampleNames, headdetails = headdetails) #1: read.celfiles(filenames = ff) This is why I went done the path of modifying read.celfiles to use big.matrix, which does not have the 2^(31)-1 limit Bye Rob sessionInfo() #R version 2.15.0 (2012-03-30) #Platform: x86_64-unknown-linux-gnu (64-bit) # #locale: # [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C # [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 # [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 # [7] LC_PAPER=C LC_NAME=C # [9] LC_ADDRESS=C LC_TELEPHONE=C #[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C # #attached base packages: #[1] tools stats graphics grDevices utils datasets methods #[8] base # #other attached packages: #[1] pd.huex.1.0.st.v2_3.6.0 RSQLite_0.11.2 DBI_0.2-5 #[4] oligo_1.20.4 oligoClasses_1.18.0 ff_2.2-10 #[7] bit_1.1-9 # #loaded via a namespace (and not attached): # [1] affxparser_1.28.1 affyio_1.24.0 Biobase_2.16.0 # [4] BiocGenerics_0.2.0 BiocInstaller_1.4.9 Biostrings_2.24.1 # [7] codetools_0.2-8 compiler_2.15.0 foreach_1.4.0 #[10] IRanges_1.14.4 iterators_1.0.6 preprocessCore_1.18.0 #[13] splines_2.15.0 stats4_2.15.0 zlibbioc_1.2.0 On 12/21/2012 10:45 PM, Benilton Carvalho wrote: > Hi Rob, > > looks like you're running an old version of oligo. > > Today, our approach is: > > library(ff) > library(oligo) > my.data <- read.celfiles(<cel file="" names="">) > > HTH, > b > > On 21 December 2012 01:02, Rob Dunne <rob.dunne at="" csiro.au=""> wrote: >> Hi Wei Liu, >> >> if they are affymetrix 1.0 ST exon arrays, I can send you a modified version of read.celfiles from the oligo package that >> should read a 300 microarray data set. I dont know it it will work for other array types, possibly not without some work. >> It is a modified version of the read.celfiles that uses the big.matrix class from the big.memory package >> >> my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) >> my. data >> #assayData: 6553600 features, 335 samples >> #Annotation: pd.huex.1.0.st.v2 >> >> Bye >> Rob >> >> >> >> >> On 12/20/2012 01:21 AM, ?? wrote: >>> Dear Buddy, >>> I am a user of affy R package. When I attempt to handle a large >>> number (aprox. 300) of microarrays, I always get an error in memory >>> allocation from R. I searched the web but didnot find any solution for >>> readaffy() with large dataset. I donnot know if the problem can be >>> fixed in some way. Any suggestion is appreciated. Thanks. >>> >>> Sincerely, >>> Wei Liu >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> - >> Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 >> CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 >> Locked Bag 17, North Ryde, New South Wales, Australia, 1670 >> http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au >> >> Java has certainly revolutionized marketing and litigation. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- - Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 Locked Bag 17, North Ryde, New South Wales, Australia, 1670 http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au Java has certainly revolutionized marketing and litigation.

ADD REPLY • link 11.3 years ago Rob Dunne ▴ 230

0

Entering edit mode

Stephen Piccolo ▴ 590

@stephen-piccolo-6761

Last seen 3.6 years ago

United States

Wei, I'm assuming your end goal is to normalize the files? If so, there are a few other options you could try for a large number of CEL files. You could process the CEL files in smaller groups. Alternatively (and in my opinion, a better approach), you could use our SCAN.UPC package (or the frma package), which are designed to normalize one file at a time. That way you only need enough memory to process one file at a time. Regards, -Steve On 12/22/2011 Sat, Dec 22, 2011 4:00 AM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: > > >------------------------------ > >Message: 10 >Date: Sat, 22 Dec 2012 15:31:51 +1100 >From: Rob Dunne <rob.dunne at="" csiro.au=""> >To: Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >Subject: Re: [BioC] bout big data set for Affy R packge >Message-ID: <50D537B7.700 at csiro.au> >Content-Type: text/plain; charset="UTF-8"; format=flowed > >Hi Benilton, > >Unless I am missing something, ff wont help in this case. From the ff >help page > >"Currently ff objects cannot have length zero and are limited to >?.Machine$integer.max? elements" > >and .Machine$integer.max is 2^(31)-1. This is exceeded when you try to >load 328 Affy exon arrays hence > >library(ff) >library(oligo) >data<-read.celfiles(filenames=files) >#Loading required package: pd.huex.1.0.st.v2 >#Loading required package: RSQLite >#Loading required package: DBI >#Platform design info loaded. >#Error in if (length < 0 || length > .Machine$integer.max) stop("length >must be between 1 and .Machine$integer.max") : ># missing value where TRUE/FALSE needed >#In addition: Warning message: >#In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = >file.path(ldPath(), : ># NAs introduced by coercion > > traceback() >#4: ff(initdata = initdata, vmode = vmode, dim = dim, pattern = >file.path(ldPath(), ># basename(name))) >#3: createFF("intensities-", dim = c(nr, length(filenames))) >#2: smartReadCEL(filenames, sampleNames, headdetails = headdetails) >#1: read.celfiles(filenames = ff) > >This is why I went done the path of modifying read.celfiles to use >big.matrix, which does not have the 2^(31)-1 >limit > >Bye >Rob > > > > > > >sessionInfo() >#R version 2.15.0 (2012-03-30) >#Platform: x86_64-unknown-linux-gnu (64-bit) ># >#locale: ># [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C ># [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 ># [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 ># [7] LC_PAPER=C LC_NAME=C ># [9] LC_ADDRESS=C LC_TELEPHONE=C >#[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C ># >#attached base packages: >#[1] tools stats graphics grDevices utils datasets methods >#[8] base ># >#other attached packages: >#[1] pd.huex.1.0.st.v2_3.6.0 RSQLite_0.11.2 DBI_0.2-5 >#[4] oligo_1.20.4 oligoClasses_1.18.0 ff_2.2-10 >#[7] bit_1.1-9 ># >#loaded via a namespace (and not attached): ># [1] affxparser_1.28.1 affyio_1.24.0 Biobase_2.16.0 ># [4] BiocGenerics_0.2.0 BiocInstaller_1.4.9 Biostrings_2.24.1 ># [7] codetools_0.2-8 compiler_2.15.0 foreach_1.4.0 >#[10] IRanges_1.14.4 iterators_1.0.6 preprocessCore_1.18.0 >#[13] splines_2.15.0 stats4_2.15.0 zlibbioc_1.2.0 > > >On 12/21/2012 10:45 PM, Benilton Carvalho wrote: >> Hi Rob, >> >> looks like you're running an old version of oligo. >> >> Today, our approach is: >> >> library(ff) >> library(oligo) >> my.data <- read.celfiles(<cel file="" names="">) >> >> HTH, >> b >> >> On 21 December 2012 01:02, Rob Dunne <rob.dunne at="" csiro.au=""> wrote: >>> Hi Wei Liu, >>> >>> if they are affymetrix 1.0 ST exon arrays, I can send you a modified >>>version of read.celfiles from the oligo package that >>> should read a 300 microarray data set. I dont know it it will work for >>>other array types, possibly not without some work. >>> It is a modified version of the read.celfiles that uses the >>>big.matrix class from the big.memory package >>> >>> my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) >>> my. data >>> #assayData: 6553600 features, 335 samples >>> #Annotation: pd.huex.1.0.st.v2 >>> >>> Bye >>> Rob >>> >>> >>> >>> >>> On 12/20/2012 01:21 AM, ?? wrote: >>>> Dear Buddy, >>>> I am a user of affy R package. When I attempt to handle a large >>>> number (aprox. 300) of microarrays, I always get an error in memory >>>> allocation from R. I searched the web but didnot find any solution for >>>> readaffy() with large dataset. I donnot know if the problem can be >>>> fixed in some way. Any suggestion is appreciated. Thanks. >>>> >>>> Sincerely, >>>> Wei Liu >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> - >>> Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 >>> CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 >>> Locked Bag 17, North Ryde, New South Wales, Australia, 1670 >>> http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au >>> >>> Java has certainly revolutionized marketing and litigation. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor > > >-- >- >Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 >CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 >Locked Bag 17, North Ryde, New South Wales, Australia, 1670 >http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au > > Java has certainly revolutionized marketing and litigation. >

ADD COMMENT • link 11.3 years ago Stephen Piccolo ▴ 590

Login before adding your answer.