normalization and analysis of connected designs
5
0
Entering edit mode
Ramon Diaz ★ 1.1k
@ramon-diaz-159
Last seen 7.1 years ago
Dear All, Suppose we have an experiment with cDNA microarrays with the structure: A -> B -> C -> D (i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; B and C in the same array, with B with Cy3, etc). In this design, and if we use log_2(R/G), testing A == D is straightforward since A and D are connected and we can express D - A as the sum of the log ratios in the three arrays. But suppose we use some non-linear normalization of the data, such as loess as in Yang et al. 2002 (package marrayNorm) or the variance stabilization method of Huber et al., 2002 (package vsn). Now, the values we have after the normalization are no longer log_2(R/G) but something else (that changes with, e.g., log_2(R*G)). Doesn't this preclude the simple "just add the ratios"? Is there something obvious I am missing? Thanks, Ram?n -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
Normalization Cancer Normalization Cancer • 766 views
0
Entering edit mode
@gordon-smyth
Last seen 29 minutes ago
WEHI, Melbourne, Australia
At 02:27 AM 2/07/2003, Ramon Diaz wrote: >Dear All, > >Suppose we have an experiment with cDNA microarrays with the structure: > >A -> B -> C -> D > >(i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; B and C >in the same array, with B with Cy3, etc). > >In this design, and if we use log_2(R/G), testing A == D is straightforward >since A and D are connected and we can express D - A as the sum of the log >ratios in the three arrays. > >But suppose we use some non-linear normalization of the data, such as >loess as >in Yang et al. 2002 (package marrayNorm) or the variance stabilization method >of Huber et al., 2002 (package vsn). Now, the values we have after the >normalization are no longer log_2(R/G) but something else (that changes with, >e.g., log_2(R*G)). Doesn't this preclude the simple "just add the ratios"? >Is there something obvious I am missing? Yes, you have misunderstood the point of the normalization step, which is to bring the log-ratios back to the scale on which the log-ratios have the expected linear relationships. If the log-ratios already obeyed the relationships such as D-A = AB+BC+CD straight off the arrays then we wouldn't normalize! Gordon >Thanks, > >Ram?n > > >-- >Ram?n D?az-Uriarte >Bioinformatics Unit >Centro Nacional de Investigaciones Oncol?gicas (CNIO) >(Spanish National Cancer Center) >Melchor Fern?ndez Almagro, 3 >28029 Madrid (Spain) >Fax: +-34-91-224-6972 >Phone: +-34-91-224-6900 > >http://bioinfo.cnio.es/~rdiaz > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
> > add the ratios"? Is there something obvious I am missing? > > Yes, you have misunderstood the point of the normalization step, which is > to bring the log-ratios back to the scale on which the log-ratios have the > expected linear relationships. If the log-ratios already obeyed the > relationships such as D-A = AB+BC+CD straight off the arrays then we > wouldn't normalize! Oooops, I shouldn't have missed that! Thank you very much for your answer. Best, Ram?n
0
Entering edit mode
@wolfgang-huber-3550
Last seen 11 weeks ago
EMBL European Molecular Biology Laborat…
Hi Ramon, You can use the chaining of "generalized log-ratios" (from vsn) in the same way as you suggested for the log-ratios. You can see the transformation that vsn makes as resulting in shrunken estimates of the log-ratios. The advantage over log-ratios is that you don't have to worry so much about the dependence of their variance on the mean (e.g. log(R*G)). Another point: It may not always be true that [1] h_3G - h_3R + h_2G - h_2R + h_1G - h_1R is a better estimate for the D-A comparison than [2] h_3G - h_1R Here, h_3G is the green channel on array 3, h_1R the red on array 1, and so on. For good arrays, [2] should have a three times lower variance. However, [1] may be able to correct for spotting irregularities between the chips. Thus which is better depends on the data and the quality of the chips. You may want to try both. Best regards Wolfgang On Tue, 1 Jul 2003, Ramon Diaz wrote: > Suppose we have an experiment with cDNA microarrays with the structure: > A -> B -> C -> D > (i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; B and C > in the same array, with B with Cy3, etc). > > In this design, and if we use log_2(R/G), testing A == D is straightforward > since A and D are connected and we can express D - A as the sum of the log > ratios in the three arrays. > > But suppose we use some non-linear normalization of the data, such as loess as > in Yang et al. 2002 (package marrayNorm) or the variance stabilization method > of Huber et al., 2002 (package vsn). Now, the values we have after the > normalization are no longer log_2(R/G) but something else (that changes with, > e.g., log_2(R*G)). Doesn't this preclude the simple "just add the ratios"? > Is there something obvious I am missing? > > Thanks, > > Ramón >
0
Entering edit mode
Dear Wolfgang, Thank you very much for your answer. A couple of things I don't see: > Another point: It may not always be true that > > [1] h_3G - h_3R + h_2G - h_2R + h_1G - h_1R > > is a better estimate for the D-A comparison than > > [2] h_3G - h_1R > > Here, h_3G is the green channel on array 3, h_1R the red on array 1, and > so on. For good arrays, [2] should have a three times lower variance. > However, [1] may be able to correct for spotting irregularities between > the chips. Thus which is better depends on the data and the quality of the > chips. You may want to try both. I am not sure I follow this. I understand that, __if__ D and A had been hybridized in the same array, then the variance of their comparison would be a third of the variance of the comparison having to use the (two-step) connectiion between A and D. But I am not sure I see how we can directly do h_3G - h_1R (if this were possible, then, there would be no need to use connected designs.) They way I was seeing the above set up was: from h_3 we can estimate phi_3 = D - C (as the mean log ratio from the arrays of type 3), from h_2, phi_2 = C - B from h_1, phi_1 = B - A phi_1, phi_2, and phi_3 are the three basic estimable effects. Since I want D - A, I estimate that from the linear combination of the phis (which here is just the sum of the phis). This is doing it "by hand"; I think that if we use a set up such as the ANOVA approach of Kerr, Churchill and collaborators (or Wolfinger et al), we end up doing essentially the same (we eventually get the "VG" effects), and we still need a connected design. So either way, I don't get to see how we can directly do h_3G - h_1R But then, maybe I am missing something obvious again... Best, Ram?n > > Best regards > > Wolfgang > > On Tue, 1 Jul 2003, Ramon Diaz wrote: > > Suppose we have an experiment with cDNA microarrays with the structure: > > A -> B -> C -> D > > (i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; B > > and C in the same array, with B with Cy3, etc). > > > > In this design, and if we use log_2(R/G), testing A == D is > > straightforward since A and D are connected and we can express D - A as > > the sum of the log ratios in the three arrays. > > > > But suppose we use some non-linear normalization of the data, such as > > loess as in Yang et al. 2002 (package marrayNorm) or the variance > > stabilization method of Huber et al., 2002 (package vsn). Now, the > > values we have after the normalization are no longer log_2(R/G) but > > something else (that changes with, e.g., log_2(R*G)). Doesn't this > > preclude the simple "just add the ratios"? Is there something obvious I > > am missing? > > > > Thanks, > > > > Ram?n > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
0
Entering edit mode
Hi Ramon, What makes the difference between D and A hybridized on the same array, and on different arrays? It is (a) the between-array variation (e.g. because each time the spotter puts down a drop of DNA it is a a little bit different, or because the arrays had different surface treatments, etc.), and (b) the between-hybridization variation (e.g. different temperatures, different volumes of the reaction chamber). These two sources of variation need to be compared to others sources, e.g. (c) between-RNA- extraction, (d) between-reverse-transcription, (e) between-labeling, (f) between- dyes. (c)-(f) are present no matter whether you D and A are on one array or on different ones. That it is possible to make (a) and (b) small is shown by the fact that useful results have been obtained through single-color arrays such as Affy or Nylon membranes. Whether in your experiment (a) and (b) are small compared to (c)-(f) depends on your particular experiment. If they are, you are better of with h_3G - h_1R than with the full chain of summands. I have seen examples where this seemed to be the case. Anyone else? Best regards Wolfgang On Wed, 2 Jul 2003, Ramon Diaz-Uriarte wrote: ... [SNIP] > I am not sure I follow this. I understand that, __if__ D and A had been > hybridized in the same array, then the variance of their comparison would be > a third of the variance of the comparison having to use the (two- step) > connectiion between A and D. But I am not sure I see how we can directly do > h_3G - h_1R > (if this were possible, then, there would be no need to use connected > designs.) > > ... [SNIP] ... > > So either way, I don't get to see how we can directly do > h_3G - h_1R > > But then, maybe I am missing something obvious again...
0
Entering edit mode
Dear Ramon, I am with you. The direct comparison design you describe is a very sensible type of design which is intended to compare RNA samples using within- spot comparisons, i.e., log-ratios or M-values. The limma package in Bioconductor is specifically designed to analyse experiments of this type. You're quite correct that you do need a connect design in order to compare all the RNA types in this way. Wolfgang is arguing for what in my lab we call a 'single-channel analysis'. The main proponents of single-channel analysis in the literature are Rus Wolfinger at SAS and Gary Churchhill at the Jax lab. As far as I am aware there is no software in Bioconductor designed to do single-channel analysis of cDNA arrays. We (I mean here Jean, Sandrine who wrote the marray packages and I) don't yet provide single-channel software because we consider it to be an experimental methodology whose validity is still be established. Normalization of single-channel data in particular is something that we are still trying to do a satisfactory job of. The only discussion of single-channel normalization for cDNA data that I am aware of in the literature is Yang and Thorne, see below. Wolfinger and Churchhill fit mixed linear models in which a spot is a random effect. One then has multiple error strata corresponding to spots, to individual channel intensities within spots and perhaps to arrays as well. There are certainly cases where one can get more information out of this approach than analysing arrays entirely using log-ratios. Statistically, the method consists of using random effects to recover information from the between-spot error strata. The real problem is to know when it is valid to take this approach and when it is not. I may have misinterpreted Wolfgang, but he does seem to be proposing a even more radical approach in which the spot error strata is ignored entirely. (I think that is only way one could get the calculation that the D-A variance is reduced by a third.) This is more radical than anything I've seen in the literature, and I don't personally think it would be a good approach for cDNA microarray data. Regards Gordon Yang, Y. H., and Thorne, N. P. (2003). Normalization for two-color cDNA microarray data. In: D. R. Goldstein (ed.), Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes - Monograph Series, Volume 40, pp. 403-418. At 03:27 AM 3/07/2003, w.huber@dkfz-heidelberg.de wrote: >Hi Ramon, > >What makes the difference between D and A hybridized on the same array, >and on different arrays? It is (a) the between-array variation (e.g. >because each time the spotter puts down a drop of DNA it is a a little bit >different, or because the arrays had different surface treatments, etc.), >and (b) the between-hybridization variation (e.g. different temperatures, >different volumes of the reaction chamber). These two sources of variation >need to be compared to others sources, e.g. (c) between-RNA- extraction, >(d) between-reverse-transcription, (e) between-labeling, (f) between- dyes. >(c)-(f) are present no matter whether you D and A are on one array or on >different ones. > >That it is possible to make (a) and (b) small is shown by the fact that >useful results have been obtained through single-color arrays such as Affy >or Nylon membranes. Whether in your experiment (a) and (b) are small >compared to (c)-(f) depends on your particular experiment. If they are, >you are better of with h_3G - h_1R than with the full chain of summands. I >have seen examples where this seemed to be the case. > >Anyone else? > >Best regards > Wolfgang > > >On Wed, 2 Jul 2003, Ramon Diaz-Uriarte wrote: >... [SNIP] > > I am not sure I follow this. I understand that, __if__ D and A had been > > hybridized in the same array, then the variance of their comparison > would be > > a third of the variance of the comparison having to use the (two- step) > > connectiion between A and D. But I am not sure I see how we can directly do > > h_3G - h_1R > > (if this were possible, then, there would be no need to use connected > > designs.) > > > > ... [SNIP] ... > > > > So either way, I don't get to see how we can directly do > > h_3G - h_1R > > > > But then, maybe I am missing something obvious again... > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
Dear Gordon, Thank you very much for your comments and discussion of Wolfgang's message, and for clarifying some issues (about Wolfinger's and Churchill's approaches) which I though I understood, but I didn't. Thanks a lot for the reference, too. Interesting about Churchill's approach, though, is that his paper in Nature Genetics (2002, 32: 490-495) makes all comparisons either within-array or using connected designs and, for example one of his papers with K. Kerr (Kerr & Churchill, 2001, Biostatistics, 2: 183-201) says explicitly that "in order to fit models such as (4.1), (4.2) and (4.3) a design should be connected" (p. 8 of the technical report; 4.1 to 4.3 are the usual ANOVA models of the Churchill group). I'll have to do some more reading. This is getting a lot more messy than I thought. Best, Ram?n On Thursday 03 July 2003 05:27, Gordon Smyth wrote: > Dear Ramon, > > I am with you. The direct comparison design you describe is a very sensible > type of design which is intended to compare RNA samples using within-spot > comparisons, i.e., log-ratios or M-values. The limma package in > Bioconductor is specifically designed to analyse experiments of this type. > You're quite correct that you do need a connect design in order to compare > all the RNA types in this way. > > Wolfgang is arguing for what in my lab we call a 'single-channel analysis'. > The main proponents of single-channel analysis in the literature are Rus > Wolfinger at SAS and Gary Churchhill at the Jax lab. As far as I am aware > there is no software in Bioconductor designed to do single-channel analysis > of cDNA arrays. We (I mean here Jean, Sandrine who wrote the marray > packages and I) don't yet provide single-channel software because we > consider it to be an experimental methodology whose validity is still be > established. Normalization of single-channel data in particular is > something that we are still trying to do a satisfactory job of. The only > discussion of single-channel normalization for cDNA data that I am aware of > in the literature is Yang and Thorne, see below. > > Wolfinger and Churchhill fit mixed linear models in which a spot is a > random effect. One then has multiple error strata corresponding to spots, > to individual channel intensities within spots and perhaps to arrays as > well. There are certainly cases where one can get more information out of > this approach than analysing arrays entirely using log-ratios. > Statistically, the method consists of using random effects to recover > information from the between-spot error strata. The real problem is to know > when it is valid to take this approach and when it is not. > > I may have misinterpreted Wolfgang, but he does seem to be proposing a even > more radical approach in which the spot error strata is ignored entirely. > (I think that is only way one could get the calculation that the D-A > variance is reduced by a third.) This is more radical than anything I've > seen in the literature, and I don't personally think it would be a good > approach for cDNA microarray data. > > Regards > Gordon > > Yang, Y. H., and Thorne, N. P. (2003). Normalization for two-color cDNA > microarray data. In: D. R. Goldstein (ed.), Science and Statistics: A > Festschrift for Terry Speed, IMS Lecture Notes - Monograph Series, Volume > 40, pp. 403-418. > > At 03:27 AM 3/07/2003, w.huber@dkfz-heidelberg.de wrote: > >Hi Ramon, > > > >What makes the difference between D and A hybridized on the same array, > >and on different arrays? It is (a) the between-array variation (e.g. > >because each time the spotter puts down a drop of DNA it is a a little bit > >different, or because the arrays had different surface treatments, etc.), > >and (b) the between-hybridization variation (e.g. different temperatures, > >different volumes of the reaction chamber). These two sources of variation > >need to be compared to others sources, e.g. (c) between-RNA- extraction, > >(d) between-reverse-transcription, (e) between-labeling, (f) between-dyes. > >(c)-(f) are present no matter whether you D and A are on one array or on > >different ones. > > > >That it is possible to make (a) and (b) small is shown by the fact that > >useful results have been obtained through single-color arrays such as Affy > >or Nylon membranes. Whether in your experiment (a) and (b) are small > >compared to (c)-(f) depends on your particular experiment. If they are, > >you are better of with h_3G - h_1R than with the full chain of summands. I > >have seen examples where this seemed to be the case. > > > >Anyone else? > > > >Best regards > > Wolfgang > > > > > >On Wed, 2 Jul 2003, Ramon Diaz-Uriarte wrote: > >... [SNIP] > > > > > I am not sure I follow this. I understand that, __if__ D and A had been > > > hybridized in the same array, then the variance of their comparison > > > > would be > > > > > a third of the variance of the comparison having to use the (two-step) > > > connectiion between A and D. But I am not sure I see how we can > > > directly do h_3G - h_1R > > > (if this were possible, then, there would be no need to use connected > > > designs.) > > > > > > ... [SNIP] ... > > > > > > So either way, I don't get to see how we can directly do > > > h_3G - h_1R > > > > > > But then, maybe I am missing something obvious again... > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
0
Entering edit mode
At 10:06 PM 3/07/2003, Ramon Diaz-Uriarte wrote: >Dear Gordon, > >Thank you very much for your comments and discussion of Wolfgang's message, >and for clarifying some issues (about Wolfinger's and Churchill's approaches) >which I though I understood, but I didn't. Thanks a lot for the reference, >too. > >Interesting about Churchill's approach, though, is that his paper in Nature >Genetics (2002, 32: 490-495) makes all comparisons either within- array or >using connected designs and, for example one of his papers with K. Kerr (Kerr >& Churchill, 2001, Biostatistics, 2: 183-201) says explicitly that "in order >to fit models such as (4.1), (4.2) and (4.3) a design should be connected" >(p. 8 of the technical report; 4.1 to 4.3 are the usual ANOVA models of the >Churchill group). Well, I think Gary's Nature Genetics paper was purely on design and didn't actually discuss analysis. I think he made the important point that you should arrange that comparisons you are most interested in should be arranged so that they made as directly as possible, either on the same arrays or via as few an intermediaries as possible. In the Biostatistics paper, they said that you need a connected design in order to be able to estimate gene x array interactions, and this is very true. Allowing gene x array interactions pretty much implies that whatever analysis you do will be equivalent to analysing the log-ratios. I think one of the strengths of Gary's work is that he has made a lot of issues explicit by writing them down in models and amongst other things this goes a long way towards clarifying the relationship between single-channel analysis and analysis of log-ratios. It's not for me to interpret Gary's papers though, he can do it far better himself! I think we'll see more single-channel analyses in the future but it does require more care and employs more complex statistical models than analysis of log-ratios. The paired structure introduced by the competitive hybridization (in classical statistical design terms it's like a split plot design) does make the analysis of cDNA arrays more complex than that of one-channel or affy arrays. Regards Gordon >I'll have to do some more reading. This is getting a lot more messy than I >thought. > >Best, > >Ram?n
0
Entering edit mode
Dear Wolfgang, Thanks a lot for your answer. I have to confess I am still confussed. I understand the points you rise about the sources of variation. But I think there is a fundamental issue we have not addressed: the values we get for G and R are comming from a competitive hybridization experiment, and thus the value of h_3G is meaningful only when we relate it to whatever was in h_3R, and thus I don't think it is some absolute measure. __If__ the competitive hybridization nature of the experiment is as relevant as I believe it is, then I think we __must__ go through the chain of connected experiments to get an estimate, regardless of the relative magnitude of the sources of variation. I need to go back, re-read a few papers, check the reference that Gordon has suggested, and think about your message carefully, but this is the way I see the issue for now. Best, and thanks again for the comments and great discussion, Ram?n On Wednesday 02 July 2003 19:27, w.huber@dkfz-heidelberg.de wrote: > Hi Ramon, > > What makes the difference between D and A hybridized on the same array, > and on different arrays? It is (a) the between-array variation (e.g. > because each time the spotter puts down a drop of DNA it is a a little bit > different, or because the arrays had different surface treatments, etc.), > and (b) the between-hybridization variation (e.g. different temperatures, > different volumes of the reaction chamber). These two sources of variation > need to be compared to others sources, e.g. (c) between-RNA- extraction, > (d) between-reverse-transcription, (e) between-labeling, (f) between-dyes. > (c)-(f) are present no matter whether you D and A are on one array or on > different ones. > > That it is possible to make (a) and (b) small is shown by the fact that > useful results have been obtained through single-color arrays such as Affy > or Nylon membranes. Whether in your experiment (a) and (b) are small > compared to (c)-(f) depends on your particular experiment. If they are, > you are better of with h_3G - h_1R than with the full chain of summands. I > have seen examples where this seemed to be the case. > > Anyone else? > > Best regards > Wolfgang > > > On Wed, 2 Jul 2003, Ramon Diaz-Uriarte wrote: > ... [SNIP] > > > I am not sure I follow this. I understand that, __if__ D and A had been > > hybridized in the same array, then the variance of their comparison would > > be a third of the variance of the comparison having to use the (two-step) > > connectiion between A and D. But I am not sure I see how we can directly > > do h_3G - h_1R > > (if this were possible, then, there would be no need to use connected > > designs.) > > > > ... [SNIP] ... > > > > So either way, I don't get to see how we can directly do > > h_3G - h_1R > > > > But then, maybe I am missing something obvious again... -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
0
Entering edit mode
Ramon Diaz ★ 1.1k
@ramon-diaz-159
Last seen 7.1 years ago
Dear Savi, Thanks for the comment; that option (as well as Wolfgangs comments), seems to me a puzzling possibility... It would be really nice, but I am not sure I see how one would be able to do it (see also Gordon Smith's comments in this thread). By the way, is there any package for quantile normalization for cDNA arrays? Best, Ram?n On Wednesday 02 July 2003 18:04, Xavier Sol? wrote: > If you use a quantile normalization and have each channel replicated at > least twice you may be able to do comparisons of the intensities of > different channels, even though they are not connected. > > Regards, > > Xavi. > > ----- Original Message ----- > From: "Ramon Diaz-Uriarte" <rdiaz@cnio.es> > To: <w.huber@dkfz-heidelberg.de>; "bioconductor" > <bioconductor@stat.math.ethz.ch> > Sent: Wednesday, July 02, 2003 5:52 PM > Subject: Re: [BioC] normalization and analysis of connected designs > > > Dear Wolfgang, > > > > Thank you very much for your answer. A couple of things I don't see: > > > Another point: It may not always be true that > > > > > > [1] h_3G - h_3R + h_2G - h_2R + h_1G - h_1R > > > > > > is a better estimate for the D-A comparison than > > > > > > [2] h_3G - h_1R > > > > > > Here, h_3G is the green channel on array 3, h_1R the red on array 1, > > > and so on. For good arrays, [2] should have a three times lower > > > variance. However, [1] may be able to correct for spotting > > > irregularities between the chips. Thus which is better depends on the > > > data and the quality of > > the > > > > chips. You may want to try both. > > > > I am not sure I follow this. I understand that, __if__ D and A had been > > hybridized in the same array, then the variance of their comparison would > > be > > > a third of the variance of the comparison having to use the (two- step) > > connectiion between A and D. But I am not sure I see how we can directly > > do > > > h_3G - h_1R > > (if this were possible, then, there would be no need to use connected > > designs.) > > > > They way I was seeing the above set up was: > > from h_3 we can estimate phi_3 = D - C (as the mean log ratio from the > > arrays > > > of type 3), > > from h_2, phi_2 = C - B > > from h_1, phi_1 = B - A > > phi_1, phi_2, and phi_3 are the three basic estimable effects. > > > > Since I want D - A, I estimate that from the linear combination of the > > phis > > > (which here is just the sum of the phis). > > > > This is doing it "by hand"; I think that if we use a set up such as the > > ANOVA > > > approach of Kerr, Churchill and collaborators (or Wolfinger et al), we > > end > > up > > > doing essentially the same (we eventually get the "VG" effects), and we > > still > > > need a connected design. > > > > So either way, I don't get to see how we can directly do > > h_3G - h_1R > > > > But then, maybe I am missing something obvious again... > > > > > > Best, > > > > Ram?n > > > > > Best regards > > > > > > Wolfgang > > > > > > On Tue, 1 Jul 2003, Ramon Diaz wrote: > > > > Suppose we have an experiment with cDNA microarrays with the > > structure: > > > > A -> B -> C -> D > > > > (i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; > > > > B and C in the same array, with B with Cy3, etc). > > > > > > > > In this design, and if we use log_2(R/G), testing A == D is > > > > straightforward since A and D are connected and we can express D - A > > as > > > > > the sum of the log ratios in the three arrays. > > > > > > > > But suppose we use some non-linear normalization of the data, such as > > > > loess as in Yang et al. 2002 (package marrayNorm) or the variance > > > > stabilization method of Huber et al., 2002 (package vsn). Now, the > > > > values we have after the normalization are no longer log_2(R/G) but > > > > something else (that changes with, e.g., log_2(R*G)). Doesn't this > > > > preclude the simple "just add the ratios"? Is there something obvious > > I > > > > > am missing? > > > > > > > > Thanks, > > > > > > > > Ram?n > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > -- > > Ram?n D?az-Uriarte > > Bioinformatics Unit > > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > > (Spanish National Cancer Center) > > Melchor Fern?ndez Almagro, 3 > > 28029 Madrid (Spain) > > Fax: +-34-91-224-6972 > > Phone: +-34-91-224-6900 > > > > http://bioinfo.cnio.es/~rdiaz > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
0
Entering edit mode
Xavier Solé ▴ 20
@xavier-sole-372
Last seen 7.1 years ago
If you use a quantile normalization and have each channel replicated at least twice you may be able to do comparisons of the intensities of different channels, even though they are not connected. Regards, Xavi. ----- Original Message ----- From: "Ramon Diaz-Uriarte" <rdiaz@cnio.es> To: <w.huber@dkfz-heidelberg.de>; "bioconductor" <bioconductor@stat.math.ethz.ch> Sent: Wednesday, July 02, 2003 5:52 PM Subject: Re: [BioC] normalization and analysis of connected designs > Dear Wolfgang, > > Thank you very much for your answer. A couple of things I don't see: > > > Another point: It may not always be true that > > > > [1] h_3G - h_3R + h_2G - h_2R + h_1G - h_1R > > > > is a better estimate for the D-A comparison than > > > > [2] h_3G - h_1R > > > > Here, h_3G is the green channel on array 3, h_1R the red on array 1, and > > so on. For good arrays, [2] should have a three times lower variance. > > However, [1] may be able to correct for spotting irregularities between > > the chips. Thus which is better depends on the data and the quality of the > > chips. You may want to try both. > > I am not sure I follow this. I understand that, __if__ D and A had been > hybridized in the same array, then the variance of their comparison would be > a third of the variance of the comparison having to use the (two- step) > connectiion between A and D. But I am not sure I see how we can directly do > h_3G - h_1R > (if this were possible, then, there would be no need to use connected > designs.) > > They way I was seeing the above set up was: > from h_3 we can estimate phi_3 = D - C (as the mean log ratio from the arrays > of type 3), > from h_2, phi_2 = C - B > from h_1, phi_1 = B - A > phi_1, phi_2, and phi_3 are the three basic estimable effects. > > Since I want D - A, I estimate that from the linear combination of the phis > (which here is just the sum of the phis). > > This is doing it "by hand"; I think that if we use a set up such as the ANOVA > approach of Kerr, Churchill and collaborators (or Wolfinger et al), we end up > doing essentially the same (we eventually get the "VG" effects), and we still > need a connected design. > > So either way, I don't get to see how we can directly do > h_3G - h_1R > > But then, maybe I am missing something obvious again... > > > Best, > > Ram?n > > > > > > Best regards > > > > Wolfgang > > > > On Tue, 1 Jul 2003, Ramon Diaz wrote: > > > Suppose we have an experiment with cDNA microarrays with the structure: > > > A -> B -> C -> D > > > (i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; B > > > and C in the same array, with B with Cy3, etc). > > > > > > In this design, and if we use log_2(R/G), testing A == D is > > > straightforward since A and D are connected and we can express D - A as > > > the sum of the log ratios in the three arrays. > > > > > > But suppose we use some non-linear normalization of the data, such as > > > loess as in Yang et al. 2002 (package marrayNorm) or the variance > > > stabilization method of Huber et al., 2002 (package vsn). Now, the > > > values we have after the normalization are no longer log_2(R/G) but > > > something else (that changes with, e.g., log_2(R*G)). Doesn't this > > > preclude the simple "just add the ratios"? Is there something obvious I > > > am missing? > > > > > > Thanks, > > > > > > Ram?n > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > -- > Ram?n D?az-Uriarte > Bioinformatics Unit > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > (Spanish National Cancer Center) > Melchor Fern?ndez Almagro, 3 > 28029 Madrid (Spain) > Fax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://bioinfo.cnio.es/~rdiaz > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
Xavier Solé ▴ 20
@xavier-sole-372
Last seen 7.1 years ago
We have seen that the effect of the competitive hybridization is no so relevant for cDNA microarrays. In fact, Affy arrays hybridize just one sample per chip. To perform quantile normalization, look at the LIMMA package, Ramon. Cheers, Xavi. ----- Original Message ----- From: "Ramon Diaz-Uriarte" <rdiaz@cnio.es> To: "Xavier Sol?" <x.sole@ico.scs.es>; <w.huber@dkfz-heidelberg.de>; "bioconductor" <bioconductor@stat.math.ethz.ch> Sent: Thursday, July 03, 2003 1:31 PM Subject: Re: [BioC] normalization and analysis of connected designs > Dear Savi, > > Thanks for the comment; that option (as well as Wolfgangs comments), seems to > me a puzzling possibility... It would be really nice, but I am not sure I see > how one would be able to do it (see also Gordon Smith's comments in this > thread). > > By the way, is there any package for quantile normalization for cDNA arrays? > > Best, > > Ram?n > > > On Wednesday 02 July 2003 18:04, Xavier Sol? wrote: > > If you use a quantile normalization and have each channel replicated at > > least twice you may be able to do comparisons of the intensities of > > different channels, even though they are not connected. > > > > Regards, > > > > Xavi. > > > > ----- Original Message ----- > > From: "Ramon Diaz-Uriarte" <rdiaz@cnio.es> > > To: <w.huber@dkfz-heidelberg.de>; "bioconductor" > > <bioconductor@stat.math.ethz.ch> > > Sent: Wednesday, July 02, 2003 5:52 PM > > Subject: Re: [BioC] normalization and analysis of connected designs > > > > > Dear Wolfgang, > > > > > > Thank you very much for your answer. A couple of things I don't see: > > > > Another point: It may not always be true that > > > > > > > > [1] h_3G - h_3R + h_2G - h_2R + h_1G - h_1R > > > > > > > > is a better estimate for the D-A comparison than > > > > > > > > [2] h_3G - h_1R > > > > > > > > Here, h_3G is the green channel on array 3, h_1R the red on array 1, > > > > and so on. For good arrays, [2] should have a three times lower > > > > variance. However, [1] may be able to correct for spotting > > > > irregularities between the chips. Thus which is better depends on the > > > > data and the quality of > > > > the > > > > > > chips. You may want to try both. > > > > > > I am not sure I follow this. I understand that, __if__ D and A had been > > > hybridized in the same array, then the variance of their comparison would > > > > be > > > > > a third of the variance of the comparison having to use the (two-step) > > > connectiion between A and D. But I am not sure I see how we can directly > > > > do > > > > > h_3G - h_1R > > > (if this were possible, then, there would be no need to use connected > > > designs.) > > > > > > They way I was seeing the above set up was: > > > from h_3 we can estimate phi_3 = D - C (as the mean log ratio from the > > > > arrays > > > > > of type 3), > > > from h_2, phi_2 = C - B > > > from h_1, phi_1 = B - A > > > phi_1, phi_2, and phi_3 are the three basic estimable effects. > > > > > > Since I want D - A, I estimate that from the linear combination of the > > > > phis > > > > > (which here is just the sum of the phis). > > > > > > This is doing it "by hand"; I think that if we use a set up such as the > > > > ANOVA > > > > > approach of Kerr, Churchill and collaborators (or Wolfinger et al), we > > > end > > > > up > > > > > doing essentially the same (we eventually get the "VG" effects), and we > > > > still > > > > > need a connected design. > > > > > > So either way, I don't get to see how we can directly do > > > h_3G - h_1R > > > > > > But then, maybe I am missing something obvious again... > > > > > > > > > Best, > > > > > > Ram?n > > > > > > > Best regards > > > > > > > > Wolfgang > > > > > > > > On Tue, 1 Jul 2003, Ramon Diaz wrote: > > > > > Suppose we have an experiment with cDNA microarrays with the > > > > structure: > > > > > A -> B -> C -> D > > > > > (i.e., A and B hybridized in the same array, A with Cy3, B with Cy5; > > > > > B and C in the same array, with B with Cy3, etc). > > > > > > > > > > In this design, and if we use log_2(R/G), testing A == D is > > > > > straightforward since A and D are connected and we can express D - A > > > > as > > > > > > > the sum of the log ratios in the three arrays. > > > > > > > > > > But suppose we use some non-linear normalization of the data, such as > > > > > loess as in Yang et al. 2002 (package marrayNorm) or the variance > > > > > stabilization method of Huber et al., 2002 (package vsn). Now, the > > > > > values we have after the normalization are no longer log_2(R/G) but > > > > > something else (that changes with, e.g., log_2(R*G)). Doesn't this > > > > > preclude the simple "just add the ratios"? Is there something obvious > > > > I > > > > > > > am missing? > > > > > > > > > > Thanks, > > > > > > > > > > Ram?n > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@stat.math.ethz.ch > > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > -- > > > Ram?n D?az-Uriarte > > > Bioinformatics Unit > > > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > > > (Spanish National Cancer Center) > > > Melchor Fern?ndez Almagro, 3 > > > 28029 Madrid (Spain) > > > Fax: +-34-91-224-6972 > > > Phone: +-34-91-224-6900 > > > > > > http://bioinfo.cnio.es/~rdiaz > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > -- > Ram?n D?az-Uriarte > Bioinformatics Unit > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > (Spanish National Cancer Center) > Melchor Fern?ndez Almagro, 3 > 28029 Madrid (Spain) > Fax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://bioinfo.cnio.es/~rdiaz > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
Dear Xavi, Thanks for your last two emails. I see your point, but it is my understanding (which has improved a lot thanks to comments from Gordon Smith) that one of the important reasons for dealing with ratios in cDNA arrays is controlling spot-to-spot variation. In fact, this is mentioned explictly in several papers (e.g., Yang & Thorne, 2003, p. 405). So, regardless of the importance of competitive hybridization, spot-to-spot variation is always there. I am not that familiar with Affy, but I think that, because their setup is very different (e.g., multiple probes per clone), the direct analogy "if we do it with Affy we ought to be able to do it with cDNA" does not really hold just like that. By the way, the paper of Yang & Thorne, which Gordon Smith mentioned in a previous email, contains discussion of single-channel normalization for cDNA, and Natalie Thorne presented a very interested talk at the last RSS meetings dealing with single-channel normalization. However, if I understand correctly, there are still some issues that need to be investigated more fully for single channel normalization and they are working on it. Yang, Y. H., and Thorne, N. P. (2003). Normalization for two-color cDNA microarray data. In: D. R. Goldstein (ed.), Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes - Monograph Series, Volume 40, pp. 403-418. Best, Ram?n On Thursday 03 July 2003 14:48, Xavier Sol? wrote: > We have seen that the effect of the competitive hybridization is no so > relevant for cDNA microarrays. In fact, Affy arrays hybridize just one > sample per chip. > > To perform quantile normalization, look at the LIMMA package, Ramon. > > Cheers, > > Xavi. > > ----- Original Message ----- > From: "Ramon Diaz-Uriarte" <rdiaz@cnio.es> > To: "Xavier Sol?" <x.sole@ico.scs.es>; <w.huber@dkfz-heidelberg.de>; > "bioconductor" <bioconductor@stat.math.ethz.ch> > Sent: Thursday, July 03, 2003 1:31 PM > Subject: Re: [BioC] normalization and analysis of connected designs > > > Dear Savi, > > > > Thanks for the comment; that option (as well as Wolfgangs comments), > > seems > > to > > > me a puzzling possibility... It would be really nice, but I am not sure I > > see > > > how one would be able to do it (see also Gordon Smith's comments in this > > thread). > > > > By the way, is there any package for quantile normalization for cDNA > > arrays? > > > Best, > > > > Ram?n > > > > On Wednesday 02 July 2003 18:04, Xavier Sol? wrote: > > > If you use a quantile normalization and have each channel replicated at > > > least twice you may be able to do comparisons of the intensities of > > > different channels, even though they are not connected. > > > > > > Regards, > > > > > > Xavi. > > > > > > ----- Original Message ----- > > > From: "Ramon Diaz-Uriarte" <rdiaz@cnio.es> > > > To: <w.huber@dkfz-heidelberg.de>; "bioconductor" > > > <bioconductor@stat.math.ethz.ch> > > > Sent: Wednesday, July 02, 2003 5:52 PM > > > Subject: Re: [BioC] normalization and analysis of connected designs > > > > > > > Dear Wolfgang, > > > > > > > > Thank you very much for your answer. A couple of things I don't see: > > > > > Another point: It may not always be true that > > > > > > > > > > [1] h_3G - h_3R + h_2G - h_2R + h_1G - h_1R > > > > > > > > > > is a better estimate for the D-A comparison than > > > > > > > > > > [2] h_3G - h_1R > > > > > > > > > > Here, h_3G is the green channel on array 3, h_1R the red on array > > > > > 1, and so on. For good arrays, [2] should have a three times lower > > > > > variance. However, [1] may be able to correct for spotting > > > > > irregularities between the chips. Thus which is better depends on > > the > > > > > > data and the quality of > > > > > > the > > > > > > > > chips. You may want to try both. > > > > > > > > I am not sure I follow this. I understand that, __if__ D and A had > > been > > > > > hybridized in the same array, then the variance of their comparison > > would > > > > be > > > > > > > a third of the variance of the comparison having to use the > > > > (two-step) connectiion between A and D. But I am not sure I see how > > > > we can > > directly > > > > do > > > > > > > h_3G - h_1R > > > > (if this were possible, then, there would be no need to use connected > > > > designs.) > > > > > > > > They way I was seeing the above set up was: > > > > from h_3 we can estimate phi_3 = D - C (as the mean log ratio from > > > > the > > > > > > arrays > > > > > > > of type 3), > > > > from h_2, phi_2 = C - B > > > > from h_1, phi_1 = B - A > > > > phi_1, phi_2, and phi_3 are the three basic estimable effects. > > > > > > > > Since I want D - A, I estimate that from the linear combination of > > > > the > > > > > > phis > > > > > > > (which here is just the sum of the phis). > > > > > > > > This is doing it "by hand"; I think that if we use a set up such as > > the > > > > ANOVA > > > > > > > approach of Kerr, Churchill and collaborators (or Wolfinger et al), > > > > we end > > > > > > up > > > > > > > doing essentially the same (we eventually get the "VG" effects), and > > we > > > > still > > > > > > > need a connected design. > > > > > > > > So either way, I don't get to see how we can directly do > > > > h_3G - h_1R > > > > > > > > But then, maybe I am missing something obvious again... > > > > > > > > > > > > Best, > > > > > > > > Ram?n > > > > > > > > > Best regards > > > > > > > > > > Wolfgang > > > > > > > > > > On Tue, 1 Jul 2003, Ramon Diaz wrote: > > > > > > Suppose we have an experiment with cDNA microarrays with the > > > > > > structure: > > > > > > A -> B -> C -> D > > > > > > (i.e., A and B hybridized in the same array, A with Cy3, B with > > Cy5; > > > > > > > B and C in the same array, with B with Cy3, etc). > > > > > > > > > > > > In this design, and if we use log_2(R/G), testing A == D is > > > > > > straightforward since A and D are connected and we can express D > > > > > > - > > A > > > > as > > > > > > > > > the sum of the log ratios in the three arrays. > > > > > > > > > > > > But suppose we use some non-linear normalization of the data, > > > > > > such > > as > > > > > > > loess as in Yang et al. 2002 (package marrayNorm) or the variance > > > > > > stabilization method of Huber et al., 2002 (package vsn). Now, > > the > > > > > > > values we have after the normalization are no longer log_2(R/G) > > but > > > > > > > something else (that changes with, e.g., log_2(R*G)). Doesn't > > this > > > > > > > preclude the simple "just add the ratios"? Is there something > > obvious > > > > I > > > > > > > > > am missing? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Ram?n > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor@stat.math.ethz.ch > > > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > -- > > > > Ram?n D?az-Uriarte > > > > Bioinformatics Unit > > > > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > > > > (Spanish National Cancer Center) > > > > Melchor Fern?ndez Almagro, 3 > > > > 28029 Madrid (Spain) > > > > Fax: +-34-91-224-6972 > > > > Phone: +-34-91-224-6900 > > > > > > > > http://bioinfo.cnio.es/~rdiaz > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@stat.math.ethz.ch > > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > -- > > Ram?n D?az-Uriarte > > Bioinformatics Unit > > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > > (Spanish National Cancer Center) > > Melchor Fern?ndez Almagro, 3 > > 28029 Madrid (Spain) > > Fax: +-34-91-224-6972 > > Phone: +-34-91-224-6900 > > > > http://bioinfo.cnio.es/~rdiaz > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
0
Entering edit mode
Hi Ramon, the package "vsn" has implemented single-channel normalization for cDNA microarrays since the end of 2001 and was published at ISMB 2002. We have run considerable comparison studies on data to validate it, where it compared favorably with log-ratio based methods. One of these comparisons is reported in that paper. The importance of spot-to-spot variation (relative to other kinds of varation) needs to be carefully considered. As a first approximation, the following calculation was pointed out to me by Gordon Smyth: Assume you have n arrays, with red and green values R1, G1, R2, .., Rn, Gn on the logarithmic scale. Assume var(Rk) = var(Gk) = sigma^2, corr(Gk,Rk)=rho and corr(Gk, Rj)=0 if j!=k. Then compare (1) var(R1-Gn) = 2 * sigma^2 (2) var{ (R1-G1) - (R2-G2) - ... (Rn-Gn) } = 2n * (1-rho) sigma^2 Whether (1) is larger than (2) or the other way round depends on n and the size of rho. Best regards ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/mga/whuber ------------------------------------- On Fri, 25 Jul 2003, Ramon Diaz-Uriarte wrote: > Dear Xavi, > > Thanks for your last two emails. I see your point, but it is my understanding > (which has improved a lot thanks to comments from Gordon Smith) that one of > the important reasons for dealing with ratios in cDNA arrays is controlling > spot-to-spot variation. In fact, this is mentioned explictly in several > papers (e.g., Yang & Thorne, 2003, p. 405). So, regardless of the importance > of competitive hybridization, spot-to-spot variation is always there. > I am not that familiar with Affy, but I think that, because their setup is > very different (e.g., multiple probes per clone), the direct analogy "if we > do it with Affy we ought to be able to do it with cDNA" does not really hold > just like that. > > By the way, the paper of Yang & Thorne, which Gordon Smith mentioned in a > previous email, contains discussion of single-channel normalization for cDNA, > and Natalie Thorne presented a very interested talk at the last RSS meetings > dealing with single-channel normalization. However, if I understand > correctly, there are still some issues that need to be investigated more > fully for single channel normalization and they are working on it. > > Yang, Y. H., and Thorne, N. P. (2003). Normalization for two-color cDNA > microarray data. In: D. R. Goldstein (ed.), Science and Statistics: A > Festschrift for Terry Speed, IMS Lecture Notes - Monograph Series, Volume > 40, pp. 403-418. > > > Best, > > Ramón
0
Entering edit mode
Dear Wolfgang, > the package "vsn" has implemented single-channel normalization for cDNA > microarrays since the end of 2001 and was published at ISMB 2002. We have > run considerable comparison studies on data to validate it, where it > compared favorably with log-ratio based methods. One of these comparisons > is reported in that paper. I apologize for not being clear: my response to Xavi regarding single- channel normalization was referring to single-channel normalization using quantile normalization. On reading my email, I see this is not at all clear from it; Xavi had mentioned single-channel using quantile normalization in the thread I was responding to, and that is what I had in mind. I have read two of your vsn papers (Bioiformatics, 2002, 18, S96-S104; Statistical Applications in Genetics and Molecular Biology, 2003, 2, article 3), and I have used the vsn package, but I understand that vsn is not using quantile normalization and that is why I did not mention it. > The importance of spot-to-spot variation (relative to other kinds of > varation) needs to be carefully considered. As a first approximation, the > following calculation was pointed out to me by Gordon Smyth: > > Assume you have n arrays, with red and green values R1, G1, R2, .., Rn, Gn > on the logarithmic scale. Assume var(Rk) = var(Gk) = sigma^2, > corr(Gk,Rk)=rho and corr(Gk, Rj)=0 if j!=k. Then compare > (1) var(R1-Gn) = 2 * sigma^2 > (2) var{ (R1-G1) - (R2-G2) - ... (Rn-Gn) } = 2n * (1-rho) sigma^2 > > Whether (1) is larger than (2) or the other way round depends on n and the > size of rho. Thanks for pointing this out; I think I see the point. Best, Ram?n > ------------------------------------- > Wolfgang Huber > Division of Molecular Genome Analysis > German Cancer Research Center > Heidelberg, Germany > Phone: +49 6221 424709 > Fax: +49 6221 42524709 > Http: www.dkfz.de/mga/whuber > ------------------------------------- > > On Fri, 25 Jul 2003, Ramon Diaz-Uriarte wrote: > > Dear Xavi, > > > > Thanks for your last two emails. I see your point, but it is my > > understanding (which has improved a lot thanks to comments from Gordon > > Smith) that one of the important reasons for dealing with ratios in cDNA > > arrays is controlling spot-to-spot variation. In fact, this is mentioned > > explictly in several papers (e.g., Yang & Thorne, 2003, p. 405). So, > > regardless of the importance of competitive hybridization, spot- to-spot > > variation is always there. I am not that familiar with Affy, but I think > > that, because their setup is very different (e.g., multiple probes per > > clone), the direct analogy "if we do it with Affy we ought to be able to > > do it with cDNA" does not really hold just like that. > > > > By the way, the paper of Yang & Thorne, which Gordon Smith mentioned in a > > previous email, contains discussion of single-channel normalization for > > cDNA, and Natalie Thorne presented a very interested talk at the last RSS > > meetings dealing with single-channel normalization. However, if I > > understand correctly, there are still some issues that need to be > > investigated more fully for single channel normalization and they are > > working on it. > > > > Yang, Y. H., and Thorne, N. P. (2003). Normalization for two-color cDNA > > microarray data. In: D. R. Goldstein (ed.), Science and Statistics: A > > Festschrift for Terry Speed, IMS Lecture Notes - Monograph Series, Volume > > 40, pp. 403-418. > > > > > > Best, > > > > Ram?n -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz
0
Entering edit mode
> >I am not that familiar with Affy, but I think that, because their setup is >very different (e.g., multiple probes per clone), the direct analogy "if we >do it with Affy we ought to be able to do it with cDNA" does not really hold >just like that. I had also been wondering about this issue of why spotted chips must be referenced but Affy chips are not. I got a seemingly logical explanation from Jeff Townsend: Array to array variation per spot in the amount deposited, shape of the spot, etc., is extremely high for spotted arrays, even with the best spotters and protocols, therefore you always need expression levels to be referenced. Affy's photolithographic method is *supposed* to have much greater precision in the number & placement of probes per feature, making direct array to array comparisons possible. Jenny Drnevich, Ph.D. Department of Animal Biology 515 Morrill Hall 505 S Goodwin Ave Urbana, IL 61801 USA ph: 217-244-6826 fax: 217-244-4565 e-mail: drnevich@uiuc.edu
0
Entering edit mode
The reason that Affy arrays are not supposed to need a reference is that a known quantity of probe is placed on the array by fabrication. This is the same on every array. On spotted arrays, the material is deposited by the print tip, and so may vary from array to array. Nonetheless, it is prudent to use a titration series for your spiking controls, to use for calibration. Alternatively, using a probe-wise normalization (see "expresso" in the affy library) should make all of your Affy arrays on the same scale. ---Naomi At 01:57 PM 7/25/2003 -0500, Jenny Drnevich wrote: >>I am not that familiar with Affy, but I think that, because their setup is >>very different (e.g., multiple probes per clone), the direct analogy "if we >>do it with Affy we ought to be able to do it with cDNA" does not really hold >>just like that. > >I had also been wondering about this issue of why spotted chips must be >referenced but Affy chips are not. I got a seemingly logical explanation >from Jeff Townsend: Array to array variation per spot in the amount >deposited, shape of the spot, etc., is extremely high for spotted arrays, >even with the best spotters and protocols, therefore you always need >expression levels to be referenced. Affy's photolithographic method is >*supposed* to have much greater precision in the number & placement of >probes per feature, making direct array to array comparisons possible. > > > >Jenny Drnevich, Ph.D. >Department of Animal Biology >515 Morrill Hall >505 S Goodwin Ave >Urbana, IL 61801 >USA > >ph: 217-244-6826 >fax: 217-244-4565 >e-mail: drnevich@uiuc.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
0
Entering edit mode
Hi, > The reason that Affy arrays are not supposed to need a reference is that a > known quantity of probe is placed on the array by fabrication. This is the > same on every array. On spotted arrays, the material is deposited by the > print tip, and so may vary from array to array. Even for spotted arrays, if the amount of DNA deposited on the array by the print-tip is well in excess of the reverse-transcribed DNA that is going to hybridize against it, the exact array-to-array variations of the deposited amount may not necessarily matter so much for the total measured fluorescence signal. This should be a valid assumption for low- expressed genes. Best regards ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/mga/whuber ------------------------------------- On Mon, 28 Jul 2003, Naomi Altman wrote: > > Nonetheless, it is prudent to use a titration series for your spiking > controls, to use for calibration. Alternatively, using a probe-wise > normalization (see "expresso" in the affy library) should make all of your > Affy arrays on the same scale. > > ---Naomi > >