problems with affy in unix

0

Entering edit mode

feiwan ▴ 20

@feiwan-386

Last seen 9.7 years ago

Dear bioconductor users: I encontoured two problems with affy in unix. the dataset is the Leukemia data from Stjude.org Question1: I tried to merge two big affybatch data sets in R (under unix mainframe) and encounter the problems as following: combine2.3<-merge(combine2.1,TALL) Error: cannot allocate vector of size 409600 Kb Question2: Affybatch BCR does not have names for each sample (only H). > > > setwd("/export/home/fwan/data/BCR") BCR<-ReadAffy()> > BCR AffyBatch object size of arrays=640x640 features (51204 kb) cdf=HG_U95Av2 (12625 affyids) number of samples=16 number of genes=12625 annotation=hgu95av2 > exprs(BCR)[1,] H H H H H H H H H H H H H H H H 687 859 883 608 711 567 827 572 621 607 565 802 1292 583 683 659 > Question3: there 335 cel files and If R can not process all of them at the same time, can I break them into different groups and then run RMA on each group? I know the final expression values will be different but I do not know if it will have a big effects on final data analysis. I am very new to this area. any suggustion will be appreciated. regards, w.f [[alternative HTML version deleted]]

affy PROcess affy PROcess • 1.1k views

ADD COMMENT • link 20.8 years ago feiwan ▴ 20

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

1. You don't have enough memory to merge the two. 335 cel files will take a TON of memory to deal with. 2. No idea on this one. What are the names of the cel files? 3. If you are just going to do RMA, then you could use justRMA instead. You may have enough memory to do it, because this function is much less memory intensive. HTH, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> "feiwan" <wanf@email.uc.edu> 07/18/03 11:41PM >>> Dear bioconductor users: I encontoured two problems with affy in unix. the dataset is the Leukemia data from Stjude.org Question1: I tried to merge two big affybatch data sets in R (under unix mainframe) and encounter the problems as following: combine2.3<-merge(combine2.1,TALL) Error: cannot allocate vector of size 409600 Kb Question2: Affybatch BCR does not have names for each sample (only H). > > > setwd("/export/home/fwan/data/BCR") BCR<-ReadAffy()> > BCR AffyBatch object size of arrays=640x640 features (51204 kb) cdf=HG_U95Av2 (12625 affyids) number of samples=16 number of genes=12625 annotation=hgu95av2 > exprs(BCR)[1,] H H H H H H H H H H H H H H H H 687 859 883 608 711 567 827 572 621 607 565 802 1292 583 683 659 > Question3: there 335 cel files and If R can not process all of them at the same time, can I break them into different groups and then run RMA on each group? I know the final expression values will be different but I do not know if it will have a big effects on final data analysis. I am very new to this area. any suggustion will be appreciated. regards, w.f [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.8 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for all of the information. I was also faced with not enough memory, but was able to obtain expression values using justRMA. From the documentation, it looks like this is what I want for normalization, but I am not sure how the expression values are derived from the normalized probes. This is the set of expresso options I would have used: expression=expresso(data,bgcorrect.method="rma",normalize.method="quan tiles", pmcorrect.method="pmonly",summary.method="liwong") Thanks, Naomi Altman At 10:29 AM 7/19/2003 -0400, James MacDonald wrote: >1. You don't have enough memory to merge the two. 335 cel files will >take a TON of memory to deal with. > >2. No idea on this one. What are the names of the cel files? > >3. If you are just going to do RMA, then you could use justRMA instead. >You may have enough memory to do it, because this function is much less >memory intensive. > >HTH, > >Jim > > > >James W. MacDonald >Affymetrix and cDNA Microarray Core >University of Michigan Cancer Center >1500 E. Medical Center Drive >7410 CCGC >Ann Arbor MI 48109 >734-647-5623 > > >>> "feiwan" <wanf@email.uc.edu> 07/18/03 11:41PM >>> >Dear bioconductor users: > >I encontoured two problems with affy in unix. the dataset is the >Leukemia data from Stjude.org > >Question1: > >I tried to merge two big affybatch data sets in R (under unix >mainframe) and encounter the problems as following: > >combine2.3<-merge(combine2.1,TALL) >Error: cannot allocate vector of size 409600 Kb > >Question2: > >Affybatch BCR does not have names for each sample (only H). > > > > > setwd("/export/home/fwan/data/BCR") >BCR<-ReadAffy()> > > BCR >AffyBatch object >size of arrays=640x640 features (51204 kb) >cdf=HG_U95Av2 (12625 affyids) >number of samples=16 >number of genes=12625 >annotation=hgu95av2 > > exprs(BCR)[1,] > H H H H H H H H H H H H H H > H H > 687 859 883 608 711 567 827 572 621 607 565 802 1292 583 >683 659 > > > >Question3: > >there 335 cel files and If R can not process all of them at the same >time, can I break them into different groups and then run RMA on each >group? I know the final expression values will be different but I do not >know if it will have a big effects on final data analysis. > > > >I am very new to this area. any suggustion will be appreciated. > >regards, > >w.f > > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

On Mon, 21 Jul 2003, Naomi Altman wrote: > Thanks for all of the information. I was also faced with not enough > memory, but was able to obtain expression values using justRMA. From the > documentation, it looks like this is what I want for normalization, but I > am not sure how the expression values are derived from the normalized probes. > > This is the set of expresso options I would have used: > > expression=expresso(data,bgcorrect.method="rma",normalize.method="qu antiles", > pmcorrect.method="pmonly",summary.method="liwong") > this is not what justRMA gives you. justRMA is equivalent to expression=expresso(data,bgcorrect.method="rma",normalize.method="quan tiles", pmcorrect.method="pmonly",summary.method="medianpolish") so medianpolish, instead of liwong. > > Thanks, > Naomi Altman > > > At 10:29 AM 7/19/2003 -0400, James MacDonald wrote: > >1. You don't have enough memory to merge the two. 335 cel files will > >take a TON of memory to deal with. > > > >2. No idea on this one. What are the names of the cel files? > > > >3. If you are just going to do RMA, then you could use justRMA instead. > >You may have enough memory to do it, because this function is much less > >memory intensive. > > > >HTH, > > > >Jim > > > > > > > >James W. MacDonald > >Affymetrix and cDNA Microarray Core > >University of Michigan Cancer Center > >1500 E. Medical Center Drive > >7410 CCGC > >Ann Arbor MI 48109 > >734-647-5623 > > > > >>> "feiwan" <wanf@email.uc.edu> 07/18/03 11:41PM >>> > >Dear bioconductor users: > > > >I encontoured two problems with affy in unix. the dataset is the > >Leukemia data from Stjude.org > > > >Question1: > > > >I tried to merge two big affybatch data sets in R (under unix > >mainframe) and encounter the problems as following: > > > >combine2.3<-merge(combine2.1,TALL) > >Error: cannot allocate vector of size 409600 Kb > > > >Question2: > > > >Affybatch BCR does not have names for each sample (only H). > > > > > > > setwd("/export/home/fwan/data/BCR") > >BCR<-ReadAffy()> > > > BCR > >AffyBatch object > >size of arrays=640x640 features (51204 kb) > >cdf=HG_U95Av2 (12625 affyids) > >number of samples=16 > >number of genes=12625 > >annotation=hgu95av2 > > > exprs(BCR)[1,] > > H H H H H H H H H H H H H H > > H H > > 687 859 883 608 711 567 827 572 621 607 565 802 1292 583 > >683 659 > > > > > > >Question3: > > > >there 335 cel files and If R can not process all of them at the same > >time, can I break them into different groups and then run RMA on each > >group? I know the final expression values will be different but I do not > >know if it will have a big effects on final data analysis. > > > > > > > >I am very new to this area. any suggustion will be appreciated. > > > >regards, > > > >w.f > > > > > > > > > > [[alternative HTML version deleted]] > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.8 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

feiwan ▴ 20

@feiwan-386

Last seen 9.7 years ago

Thanks, Dr. James: For the 2nd question, The ReadAffy did go wrong. I tried justRMA on the same cel files of class BCR and there are sample names instead of just "H". samples names are cel file names. for the 3rd question, I still need some help. I have 335 cel files in total and I even tried justRMA and it always went wrong when it read in the 127th file, saying something like: could not open Hyperdip-5-C.cel (it is not problem with this cel file, i deleted it and justRMA went wrong again at another 127th file with same error message). My question is: if I break 335 into 3 groups ( approximately 110 cel files in a group and each group contains a few classes of lukemia cel files defined by saint jude childrens. there are 9 classes: BCR,NORMAL,HYPERDODIP,MLL..etc) and then run justRMA on each group and merge the final expression files later, is this process justified for the later data analysis ? I know this expression file will be different from the expression file i can get from justRMA on 335 cel files at one time. however, upgrading the mainframe might not be possible right now. thanks again for your help. good weekend. regards, w.f

ADD COMMENT • link 20.8 years ago feiwan ▴ 20

0

Entering edit mode

do you mind sending the cel filenames to see what could be causing the problem. also send version number of R and affy. breaking up would be the easist thing. the larger the groups the better. if you dont mind leaving R, you may try ben bolstad's RMAexpress: http://stat-www.berkeley.edu/users/bolstad/RMAExpress/RMAExpress.html finally, a strategy i recently heard terry speed suggest is to "train" rma on say 50 chips (get probe effect estimates and an "average distribution" from bolstad's quantile normalization), then go chip by chip quantile normalizing to the average distribution and fitting a robust procedure to a linear model but with known probe-effects. this requires some coding but it would solve your problem in a more elegant way. btw, this strategy should work for other multi-chip measures such as li and wong's. On Sat, 19 Jul 2003, fwan wrote: > Thanks, Dr. James: > > For the 2nd question, The ReadAffy did go wrong. I tried justRMA on the same > cel files of class BCR and there are sample names instead of just "H". > samples names are cel file names. > > for the 3rd question, I still need some help. I have 335 cel files in total > and I even tried justRMA and it always went wrong when it read in the 127th > file, saying something like: > could not open Hyperdip-5-C.cel (it is not problem with this cel file, i > deleted it and justRMA went wrong again at another 127th file with same > error message). My question is: if I break 335 into 3 groups ( approximately > 110 cel files in a group and each group contains a few classes of lukemia > cel files defined by saint jude childrens. there are 9 classes: > BCR,NORMAL,HYPERDODIP,MLL..etc) and then run justRMA on each group and merge > the final expression files later, is this process justified for the later > data analysis ? I know this expression file will be different from the > expression file i can get from justRMA on 335 cel files at one time. > however, upgrading the mainframe might not be possible right now. > > thanks again for your help. good weekend. > > regards, > > w.f > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.8 years ago Rafael A. Irizarry ★ 2.3k

Login before adding your answer.