Coercing Normalized Data to exprSet

0

Entering edit mode

Barry Henderson ▴ 250

@barry-henderson-49

Last seen 11.4 years ago

Dear List I have a set of two color array data I am trying to analyze with BioConductor. The experiment is of a loop design. I have read the data in and conormalized it leaving me with a marrayNorm object. I have coerced that object into an exprSet object and I am now trying to understand how to filter/test on that object. Since the individual channels of the two color array data have been collapsed into a single log ratio in the exprSet object, assigning covariates to individual channels (samples) seems unobvious. Am I missing something? Is this a challenge in dealing with loop designs? Or is this a limitation of BioConductor with respect to this experimental design? Can I calculate the normalized expression, write the values out and then read them back in as an exprSet? If so, is there a facile way to handle this process. I've been through the docs, vignettes, and worked with the eset provided with BioBase but it simply isn't obvious to me how to work with two color, loop designs. Thanks in advance for any advice. I have pasted excerpts of the marrayNorm (normalized.data) and exprSet (tox2) objects I am working with below. As you can see 45 arrays get collapsed into 45 samples... As an added note, I have calculated normalized intensity values and written them out for input into maanova but I would like to undertand how to do this in BioConductor if possible. Barry Henderson > normalized.data Normalized intensity data: Object of class marrayNorm. Number of arrays: 45 arrays. A) Layout of spots on the array: Array layout: Object of class marrayLayout. Total number of spots: 2688 Dimensions of grid matrix: 4 rows by 4 cols Dimensions of spot matrices: 12 rows by 14 cols Currently working with a subset of 2688 spots. Control spots: There are 2 types of controls : Control normal 208 2480 Notes on layout: C:/Tox2/genes.txt B) Samples hybridized to the array: Object of class marrayInfo. maLabels # of slide Names Experiment Cy3 Experiment Cy5 1 34-108-1 34-108-1 34-108-1.Rinput Wyeth Bezafibrate 2 34-108-2 34-108-2 34-108-2.Rinput Lovastatin Wyeth 3 ... ===================== > tox2 Expression Set (exprSet) with 2688 genes 45 samples phenoData object with 6 variables and 45 cases varLabels : # of slide : Names : Experiment Cy3 : Experiment Cy5 : date : Comments [[alternate HTML version deleted]]

Biobase PROcess maanova Biobase PROcess maanova • 1.6k views

ADD COMMENT • link 23.0 years ago Barry Henderson ▴ 250

0

Entering edit mode

Barry Henderson ▴ 250

@barry-henderson-49

Last seen 11.4 years ago

Wolfgang Thanks! A big help. I understand the normalization process and had been looking at Sandrine's paper but I had not come up with the specific derivations. For completeness and to help in understanding the inner workings of BioConductor, I am also interested in direct normalization of the entire R and G matrices? Can you provide a little more insight into that process? I've tried a couple of direct attempts using maNormMain but it is not obvious from your initial response and it is not well covered in the docs. Thanks again Barry -----Original Message----- From: Wolfgang Huber [mailto:w.huber@dkfz-heidelberg.de] Sent: Tuesday, February 04, 2003 12:44 PM To: Barry Henderson Cc: Sandrine Dudoit Subject: RE: [BioC] Coercing Normalized Data to exprSet Hi Barry Sandrine: I cc you to make sure what I say is OK! It is described in the paper "Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments" by Sandrine Dudoit, Yee Hwa Yang, Matthew J. Callow, and Terence P. Speed. If R is the matrix of the log2 of the background-corrected unnormalized red intensities, and G the corresponding for green intensites (rows=spots, columns=chips), then the unnormalized M and A are: M = R-G A = 1/2 (R+G) Normalization in the approach of marrayNorm involves some kind of manipulation of the matrix M, e.g. subtracting something from each column. If you need to, you can solve the above equation for the R and G: R = (M+2A)/2 G = (2A-M)/2 and obtain the "normalized" R and G. As I said, other normalization methods work on the R and G matrices directly, it depends on what you want to do. Best regards Wolfgang Division of Molecular Genome Analysis (Poustka) German Cancer Research Center (DKFZ) Im Neuenheimer Feld 580 69120 Heidelberg, Germany w.huber@dkfz.de http://www.dkfz.de/abt0840/whuber Tel +49-6221-424709 Fax +49-6221-42524709 > -----Original Message----- > From: Barry Henderson [mailto:barry.henderson@ribonomics.com] > Sent: Tuesday, February 04, 2003 6:30 PM > To: Wolfgang Huber > Subject: RE: [BioC] Coercing Normalized Data to exprSet > > > Wolfgang > > Thanks for the response. Not meaning to ask an obvious question but > how does x = new('exprSet', exprs = cbind(M-A, M+A)) get me to > normalized, logged R and G values? Sorry, I'm not a trained > statistician, just trying to pick it up on the fly. > > Barry > > -----Original Message----- > From: Wolfgang Huber [mailto:w.huber@dkfz-heidelberg.de] > Sent: Tuesday, February 04, 2003 12:02 PM > To: Barry Henderson > Subject: RE: [BioC] Coercing Normalized Data to exprSet > > > Hi Barry, > > The (M,A) representation of the data is very good for pairwise > comparison, but in some cases other representations may be more > useful. You could go back to the (normalized, logged) R and G values > and construct an exprSet with twice as many columns as chips, and each > pair of columns containing the R and G values. Schematically, this is > done by > > x = new('exprSet', exprs = cbind(M-A, M+A)) > > (modulo signs and factors). There are also normalization methods that > normalize the whole matrix cbind(Rf-Rb, Gf-Gb) [as in > marrayRaw]simultaneously, rather than chip-by-chip. I've no clear idea > about the pros and cons, but just to mention it. > > Best regards > Wolfgang > > Division of Molecular Genome Analysis > German Cancer Research Center (DKFZ) > Im Neuenheimer Feld 580 > 69120 Heidelberg, Germany > > w.huber@dkfz.de > http://www.dkfz.de/abt0840/whuber > Tel +49-6221-424709 > Fax +49-6221-42524709 > > > > -----Original Message----- > > From: bioconductor-admin@stat.math.ethz.ch > > [mailto:bioconductor-admin@stat.math.ethz.ch]On Behalf Of Barry > > Henderson > > Sent: Tuesday, February 04, 2003 3:47 PM > > To: bioconductor@stat.math.ethz.ch > > Subject: [BioC] Coercing Normalized Data to exprSet > > > > > > Dear List > > > > I have a set of two color array data I am trying to analyze with > > BioConductor. The experiment is of a loop design. I have read the > > data in and conormalized it leaving me with a marrayNorm object. I > > have coerced that object into an exprSet object and I am now trying > > to > > > understand how to filter/test on that object. Since the individual > > channels of the two color array data have been collapsed into a > > single > > > log ratio in the exprSet object, assigning covariates to individual > > channels (samples) seems unobvious. > > > > Am I missing something? Is this a challenge in dealing with loop > > designs? Or is this a limitation of BioConductor with respect to > > this > > > experimental design? > > > > Can I calculate the normalized expression, write the values out and > > then read them back in as an exprSet? If so, is there a facile way > > to > > > handle this process. I've been through the docs, vignettes, and > > worked with the eset provided with BioBase but it simply isn't > > obvious > > > to me how to work with two color, loop designs. > > > > Thanks in advance for any advice. I have pasted excerpts of the > > marrayNorm (normalized.data) and exprSet (tox2) objects I am working > > with below. As you can see 45 arrays get collapsed into 45 > > samples... > > > > As an added note, I have calculated normalized intensity values and > > written them out for input into maanova but I would like to > > undertand how to do this in BioConductor if possible. > > > > Barry Henderson > > > > > > > normalized.data > > Normalized intensity data: Object of class marrayNorm. > > > > Number of arrays: 45 arrays. > > > > A) Layout of spots on the array: > > Array layout: Object of class marrayLayout. > > > > Total number of spots: 2688 > > Dimensions of grid matrix: 4 rows by 4 cols > > Dimensions of spot matrices: 12 rows by 14 cols > > > > Currently working with a subset of 2688 spots. > > > > Control spots: > > There are 2 types of controls : > > Control normal > > 208 2480 > > > > > > Notes on layout: > > C:/Tox2/genes.txt > > > > B) Samples hybridized to the array: > > Object of class marrayInfo. > > > > maLabels # of slide Names Experiment Cy3 Experiment Cy5 > > 1 34-108-1 34-108-1 34-108-1.Rinput Wyeth Bezafibrate > > 2 34-108-2 34-108-2 34-108-2.Rinput Lovastatin Wyeth > > 3 ... > > > > ===================== > > > > > tox2 > > Expression Set (exprSet) with > > 2688 genes > > 45 samples > > phenoData object with 6 variables and 45 cases > > varLabels > > : # of slide > > : Names > > : Experiment Cy3 > > : Experiment Cy5 > > : date > > : Comments > > > > > > [[alternate HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > >

ADD COMMENT • link 23.0 years ago Barry Henderson ▴ 250

0

Entering edit mode

Barry Henderson ▴ 250

@barry-henderson-49

Last seen 11.4 years ago

Vince, Once I have the exprSet created with all the covariates I'll get back in touch with you regarding the test statistic and filters. Thanks Barry -----Original Message----- From: Vincent Carey 525-2265 [mailto:stvjc@channing.harvard.edu] Sent: Tuesday, February 04, 2003 1:44 PM To: Barry Henderson Cc: Vincent Carey 525-2265 Subject: RE: [BioC] Coercing Normalized Data to exprSet > Vince, thanks for responding. > > You are correct in that I am asking "How do I use the covariate > information in the "experiment Cy3" and "Experiment Cy5" phenoData > slots to filter and test." > > In coercing a marrayNorm object into an exprSet it is just not obvious > to me how to proceed---R is a second language. The log ratios > represent 2 samples, not one, and operating on multiple covariates > simultaneously sounds pretty nasty. Would it not be better if the > coersion method kept the samples separate in this case? I am > appending one full loop as Read into my sample so you can see the > relationships better. Thanks for your input. Now that I have seen the dialogue with Wolfgang, it looks as if you have an alternative coercion that gets you the separate channel data. So the problem remains to craft a filter or test function that computes the contrasts you are interested in. It seems that the new("exprSet" ... expression that was given needs to be expanded a bit so that the appropriate phenoData are attached, and in fact those phenoData need to be created. I have to run off for a little while but if in the mean time you get a chance to write down the gene-specific test statistic you would like to compute on the basis of the loop, getting the pheno data and writing the filter will be pretty straightforward.

ADD COMMENT • link 23.0 years ago Barry Henderson ▴ 250

Login before adding your answer.