RMA normalization when using subsets of samples

0

Entering edit mode

Sylvia.Merk@ukmuenster.de ▴ 60

@sylviamerkukmuensterde-1608

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060214/ 34ac66fe/attachment.pl

• 2.1k views

ADD COMMENT • link updated 19.9 years ago by Ron Ophir ▴ 270 • written 19.9 years ago by Sylvia.Merk@ukmuenster.de ▴ 60

0

Entering edit mode

E Motakis, Mathematics ▴ 190

@e-motakis-mathematics-558

Last seen 11.3 years ago

Dear all, I am looking for free-to-download two color cDNA data that contain gene replicates in the same slide (not repeated experiments). Could anyone please tell me where to find this type? I have checked Stanford Microarray database but I don't think that the data I have found are what I am looking for. Thanks, Makis ---------------------- E Motakis, Mathematics E.Motakis at bristol.ac.uk

ADD COMMENT • link 19.9 years ago E Motakis, Mathematics ▴ 190

0

Entering edit mode

Hi Makis, Have you looked at the Gene Expression Omnibus web site (GEO). Here is some microarrays with reporters spotted in quadruplicate: http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GPL1309 Hope it helps, Nolwenn ************************************** Nolwenn Le Meur, PhD Fred Hutchinson Cancer Research Center Computational Biology 1100 Fairview Ave. N., M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 On Tue, 14 Feb 2006, E Motakis, Mathematics wrote: > Dear all, > > I am looking for free-to-download two color cDNA data that contain gene > replicates in the same slide (not repeated experiments). Could anyone > please tell me where to find this type? I have checked Stanford Microarray > database but I don't think that the data I have found are what I am looking > for. > > Thanks, > Makis > > ---------------------- > E Motakis, Mathematics > E.Motakis at bristol.ac.uk > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.9 years ago Nolwenn LeMeur ▴ 140

0

Entering edit mode

On a similar subject, does anyone know of any studies done on Affymetrix platform using technical replicates ? Specifically I am looking for datasets used for real biological application rather those done for methodological purposes (i.e. calibration, reproducibility, spike in studies). Thank you. Regards, Adai On Tue, 2006-02-14 at 10:25 -0800, Nolwenn LeMeur wrote: > Hi Makis, > Have you looked at the Gene Expression Omnibus web site (GEO). > Here is some microarrays with reporters spotted in quadruplicate: > http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GPL1309 > > Hope it helps, > Nolwenn > > ************************************** > Nolwenn Le Meur, PhD > Fred Hutchinson Cancer Research Center > Computational Biology > 1100 Fairview Ave. N., M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > On Tue, 14 Feb 2006, E Motakis, Mathematics wrote: > > > Dear all, > > > > I am looking for free-to-download two color cDNA data that contain gene > > replicates in the same slide (not repeated experiments). Could anyone > > please tell me where to find this type? I have checked Stanford Microarray > > database but I don't think that the data I have found are what I am looking > > for. > > > > Thanks, > > Makis > > > > ---------------------- > > E Motakis, Mathematics > > E.Motakis at bristol.ac.uk > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.9 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

martin.schumacher@novartis.com ▴ 80

@martinschumachernovartiscom-1610

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060215/ 82767dbb/attachment.pl

ADD COMMENT • link 19.9 years ago martin.schumacher@novartis.com ▴ 80

0

Entering edit mode

Ron Ophir ▴ 270

@ron-ophir-1010

Last seen 11.3 years ago

Dear all, I think that D.R. Godstein has tried to answer Sylvia's question in http://ludwig-sun2.unil.ch/~darlene/ms/prRMA.pdf Ron >>> <larry.lapointe at="" csiro.au=""> 02/15/06 11:55 AM >>> Dear Martin, We have run up to 550 chips achieving a reasonable processing time -- not more than an hour or so. The practical limits seem to be more related to machine RAM and R memory management. RMA normalization of 550 chips occupies about 12 GB or so on our quad processor Opteron- based system. Larry Lawrence LaPointe CSIRO Bioinformatics for Human Health Sydney, Australia -----Original Message----- From: bioconductor-bounces at stat.math.ethz.ch on behalf of martin.schumacher at novartis.com Sent: Wed 2/15/2006 7:43 PM To: bioconductor at stat.math.ethz.ch Cc: Subject: Re: [BioC] RMA normalization when using subsets of samples Dear Colleagues, Greetings from Switzerland ! I agree with the statements of Wolfgang and Adai. Using all chips will certainly put you on the safe side. I wonder what you feel would be the minimal number of chips for a "stable" (meaning that a larger set would give essentially the same results) RMA processing? People from GeneLogic told me that about 20 chips are sufficient. Is it possible to run RMA using Bioconductor with 200 chips and get the results back within a reasonable time? Best regards, Martin Adaikalavan Ramasamy <ramasamy at="" cancer.org.uk=""> Sent by: bioconductor-bounces at stat.math.ethz.ch 15.02.2006 01:01 Please respond to ramasamy To: Wolfgang Huber <huber at="" ebi.ac.uk=""> cc: Sylvia.Merk at ukmuenster.de, bioconductor at stat.math.ethz.ch, (bcc: Martin Schumacher/PH/Novartis) Subject: Re: [BioC] RMA normalization when using subsets of samples Category: This would be a problem if one or more of the resulting subsets is small and contains outliers. My preference is to preprocess all arrays together. My reasoning is that doing this will give RMA median polish (and to a lesser extent with the quantile normalisation) steps much more information to work with. Regards, Adai On Wed, 2006-02-15 at 17:16 +0000, Wolfgang Huber wrote: > Dear Sylvia, > > this might not be the answer that you want to hear, but for the end > result it should not matter (substantially) which of the two > possibilities you take, and I would be worried if it did. > > Best wishes > Wolfgang > > ------------------------------------- > Wolfgang Huber > European Bioinformatics Institute > European Molecular Biology Laboratory > Cambridge CB10 1SD > England > Phone: +44 1223 494642 > Fax: +44 1223 494486 > Http: www.ebi.ac.uk/huber > ------------------------------------- > > Sylvia.Merk at ukmuenster.de wrote: > > Dear bioconductor list, > > > > I have a question concerning RMA-normalization: > > > > There are for example 200 CEL-Files and the clinicians have several > > research questions - each concernig only a subset of the 200 samples > > including the possibility that some samples are included in more than > > one question. > > > > There are two possibilities to normalize the CEL-Files: > > > > 1.: Read all 200 chips in an affybatch-object and normalize all 200 > > chips together and further analyze the required subset. > > > > 2.: Read only the required chips in an affybatch-object, normalize these > > chips and then perform further analysis > > I think that this approach is the better one but it has the disadvantage > > that some samples are included in several normalizations ending in > > different gene expression levels for a single sample. > > > > What is (from a statisticians view) the appropriate approach to > > normalize CEL-Files in this case? > > > > Thank you in advance > > Sylvia > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.9 years ago Ron Ophir ▴ 270

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 3 months ago

EMBL European Molecular Biology Laborat…

Dear Sylvia, this might not be the answer that you want to hear, but for the end result it should not matter (substantially) which of the two possibilities you take, and I would be worried if it did. Best wishes Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber ------------------------------------- Sylvia.Merk at ukmuenster.de wrote: > Dear bioconductor list, > > I have a question concerning RMA-normalization: > > There are for example 200 CEL-Files and the clinicians have several > research questions - each concernig only a subset of the 200 samples > including the possibility that some samples are included in more than > one question. > > There are two possibilities to normalize the CEL-Files: > > 1.: Read all 200 chips in an affybatch-object and normalize all 200 > chips together and further analyze the required subset. > > 2.: Read only the required chips in an affybatch-object, normalize these > chips and then perform further analysis > I think that this approach is the better one but it has the disadvantage > that some samples are included in several normalizations ending in > different gene expression levels for a single sample. > > What is (from a statisticians view) the appropriate approach to > normalize CEL-Files in this case? > > Thank you in advance > Sylvia >

ADD COMMENT • link 19.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

This would be a problem if one or more of the resulting subsets is small and contains outliers. My preference is to preprocess all arrays together. My reasoning is that doing this will give RMA median polish (and to a lesser extent with the quantile normalisation) steps much more information to work with. Regards, Adai On Wed, 2006-02-15 at 17:16 +0000, Wolfgang Huber wrote: > Dear Sylvia, > > this might not be the answer that you want to hear, but for the end > result it should not matter (substantially) which of the two > possibilities you take, and I would be worried if it did. > > Best wishes > Wolfgang > > ------------------------------------- > Wolfgang Huber > European Bioinformatics Institute > European Molecular Biology Laboratory > Cambridge CB10 1SD > England > Phone: +44 1223 494642 > Fax: +44 1223 494486 > Http: www.ebi.ac.uk/huber > ------------------------------------- > > Sylvia.Merk at ukmuenster.de wrote: > > Dear bioconductor list, > > > > I have a question concerning RMA-normalization: > > > > There are for example 200 CEL-Files and the clinicians have several > > research questions - each concernig only a subset of the 200 samples > > including the possibility that some samples are included in more than > > one question. > > > > There are two possibilities to normalize the CEL-Files: > > > > 1.: Read all 200 chips in an affybatch-object and normalize all 200 > > chips together and further analyze the required subset. > > > > 2.: Read only the required chips in an affybatch-object, normalize these > > chips and then perform further analysis > > I think that this approach is the better one but it has the disadvantage > > that some samples are included in several normalizations ending in > > different gene expression levels for a single sample. > > > > What is (from a statisticians view) the appropriate approach to > > normalize CEL-Files in this case? > > > > Thank you in advance > > Sylvia > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.9 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

I only wish that Wolfgang's answer matched my experience. It does seem to matter. I don't think there is a statistical answer to your question, but as a statistician, I do feel more comfortable preprocessing all together. --Naomi At 07:01 PM 2/14/2006, Adaikalavan Ramasamy wrote: >This would be a problem if one or more of the resulting subsets is small >and contains outliers. > >My preference is to preprocess all arrays together. My reasoning is that >doing this will give RMA median polish (and to a lesser extent with the >quantile normalisation) steps much more information to work with. > >Regards, Adai > > > > >On Wed, 2006-02-15 at 17:16 +0000, Wolfgang Huber wrote: > > Dear Sylvia, > > > > this might not be the answer that you want to hear, but for the end > > result it should not matter (substantially) which of the two > > possibilities you take, and I would be worried if it did. > > > > Best wishes > > Wolfgang > > > > ------------------------------------- > > Wolfgang Huber > > European Bioinformatics Institute > > European Molecular Biology Laboratory > > Cambridge CB10 1SD > > England > > Phone: +44 1223 494642 > > Fax: +44 1223 494486 > > Http: www.ebi.ac.uk/huber > > ------------------------------------- > > > > Sylvia.Merk at ukmuenster.de wrote: > > > Dear bioconductor list, > > > > > > I have a question concerning RMA-normalization: > > > > > > There are for example 200 CEL-Files and the clinicians have several > > > research questions - each concernig only a subset of the 200 samples > > > including the possibility that some samples are included in more than > > > one question. > > > > > > There are two possibilities to normalize the CEL-Files: > > > > > > 1.: Read all 200 chips in an affybatch-object and normalize all 200 > > > chips together and further analyze the required subset. > > > > > > 2.: Read only the required chips in an affybatch-object, normalize these > > > chips and then perform further analysis > > > I think that this approach is the better one but it has the disadvantage > > > that some samples are included in several normalizations ending in > > > different gene expression levels for a single sample. > > > > > > What is (from a statisticians view) the appropriate approach to > > > normalize CEL-Files in this case? > > > > > > Thank you in advance > > > Sylvia > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 19.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Larry La Pointe ▴ 60

@larry-la-pointe-914

Last seen 11.3 years ago

Dear Martin, We have run up to 550 chips achieving a reasonable processing time -- not more than an hour or so. The practical limits seem to be more related to machine RAM and R memory management. RMA normalization of 550 chips occupies about 12 GB or so on our quad processor Opteron- based system. Larry Lawrence LaPointe CSIRO Bioinformatics for Human Health Sydney, Australia -----Original Message----- From: bioconductor-bounces at stat.math.ethz.ch on behalf of martin.schumacher at novartis.com Sent: Wed 2/15/2006 7:43 PM To: bioconductor at stat.math.ethz.ch Cc: Subject: Re: [BioC] RMA normalization when using subsets of samples Dear Colleagues, Greetings from Switzerland ! I agree with the statements of Wolfgang and Adai. Using all chips will certainly put you on the safe side. I wonder what you feel would be the minimal number of chips for a "stable" (meaning that a larger set would give essentially the same results) RMA processing? People from GeneLogic told me that about 20 chips are sufficient. Is it possible to run RMA using Bioconductor with 200 chips and get the results back within a reasonable time? Best regards, Martin Adaikalavan Ramasamy <ramasamy at="" cancer.org.uk=""> Sent by: bioconductor-bounces at stat.math.ethz.ch 15.02.2006 01:01 Please respond to ramasamy To: Wolfgang Huber <huber at="" ebi.ac.uk=""> cc: Sylvia.Merk at ukmuenster.de, bioconductor at stat.math.ethz.ch, (bcc: Martin Schumacher/PH/Novartis) Subject: Re: [BioC] RMA normalization when using subsets of samples Category: This would be a problem if one or more of the resulting subsets is small and contains outliers. My preference is to preprocess all arrays together. My reasoning is that doing this will give RMA median polish (and to a lesser extent with the quantile normalisation) steps much more information to work with. Regards, Adai On Wed, 2006-02-15 at 17:16 +0000, Wolfgang Huber wrote: > Dear Sylvia, > > this might not be the answer that you want to hear, but for the end > result it should not matter (substantially) which of the two > possibilities you take, and I would be worried if it did. > > Best wishes > Wolfgang > > ------------------------------------- > Wolfgang Huber > European Bioinformatics Institute > European Molecular Biology Laboratory > Cambridge CB10 1SD > England > Phone: +44 1223 494642 > Fax: +44 1223 494486 > Http: www.ebi.ac.uk/huber > ------------------------------------- > > Sylvia.Merk at ukmuenster.de wrote: > > Dear bioconductor list, > > > > I have a question concerning RMA-normalization: > > > > There are for example 200 CEL-Files and the clinicians have several > > research questions - each concernig only a subset of the 200 samples > > including the possibility that some samples are included in more than > > one question. > > > > There are two possibilities to normalize the CEL-Files: > > > > 1.: Read all 200 chips in an affybatch-object and normalize all 200 > > chips together and further analyze the required subset. > > > > 2.: Read only the required chips in an affybatch-object, normalize these > > chips and then perform further analysis > > I think that this approach is the better one but it has the disadvantage > > that some samples are included in several normalizations ending in > > different gene expression levels for a single sample. > > > > What is (from a statisticians view) the appropriate approach to > > normalize CEL-Files in this case? > > > > Thank you in advance > > Sylvia > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.9 years ago Larry La Pointe ▴ 60

Login before adding your answer.