Bioconductor Digest, Vol 28, Issue 23

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

Wolfgang, Naomi is refering to what I call the "intraspot" correlation, see for example the intraspotCorrelation() function in the limma package, and it is critically important. The correlation isn't a bad thing, nor is it restricted to poor quality arrays. Rather it means that contrasts estimated within a spot are highly accurate. It is what makes the two-colour technology intrinsically more accurate than one channel technology, other things being equal. See http://www.statsci.org/smyth/pubs/ISI2005-116.pdf for some discussion. Basically, you're saying that if the arrays are very high quality, you can get away with an inefficient analysis. Why not do it properly and get the full benefit of the high quality arrays? My experience is that high quality Agilent arrays can beat affy for accuracy if treated properly. Gordon >Date: Thu, 23 Jun 2005 15:29:38 +0100 (BST) >From: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >Subject: Re: [BioC] Agilent Arrays >To: "Naomi Altman" <naomi at="" stat.psu.edu=""> >Cc: bioconductor at stat.math.ethz.ch > >Hi Naomi, > >and why is that important? Also, what is the within gene correlation >between green foreground of array 1 and green foreground of array 2? > >Bw > Wolfgang > ><quote who="Naomi Altman"> > > I am working with Agilent arrays on which we have spotted many replicates > > of the control spots. > > The within gene correlation between red and green forground is about 0.8 > > for the unnormalized data - i.e. pretty high! > > > > --Naomi > > > > At 03:23 AM 6/23/2005, Wolfgang Huber wrote: > >>Hi Claus, > >> > >>for the normalization of arrays where the spotting etc. variability > >>between chips is not strong, you can treat the data from m two- colour > >>arrays as if it were 2*m single colour ones, and use methods like > >>"quantiles" or "vsn". > >> > >>Note that for almost all genes, the hybridization is not limited by the > >>amount of probe DNA, hence the competition between red and gree target is > >>negligible for almost all genes (execept possibly the most highly > >>expressed ones). This justifies treating a two-color array like two > >>single-color arrays. > >> > >>Only later when you consider the contrasts of interest for finding > >>differentially expressed genes, you want to make sure that these are not > >>confounded with dye. > >> > >>PS, I think your question is very directly Bioconductor related! > >> > >>Best wishes > >> Wolfgang > >> > >> > >><quote who="Claus Mayer"> > >> > Dear all! > >> > > >> > Apologies for asking a question which is not directly Bioconductor > >> > related: After some experience with spotted 2-channel arrays and > >> > Affydata, I am currently analysing my first data set based on Agilent > >> > arrays. I know that packages like marray or limma have facilities to > >> > read these data and that they can be normalised and analysed like any > >> > other 2-colour-arrays. On the other hand the printing technology of > >> > these arrays (using inkjet-printing of 60mer oligos) is closer in > >> spirit > >> > to Affy, if I understand this correctly. This seems to show in the > >> data > >> > as well. For example the strongest correlations I found in the single > >> > channel (log-)intensities was not between the two channels observed on > >> > the same slide (like with spotted arrays), but between the two > >> channels > >> > (differently dyed on different arrays in a loop design) that contained > >> > the same sample (which is quite reassuring). This made me wonder > >> whether > >> > (once dye and array effects have been removed by some normalisation > >> > method) with Agilent arrays one might really use single channel > >> > intensities as measures of gene expression instead of reducing them to > >> > the log-ratio only as is usually done for two-channel data. > >> > > >> > This would have consequences on the way these arrays should be > >> > normalised (rather by a multichip method than individually) and also > >> > allow more flexibility in the design of experiments. > >> > > >> > As I said before this is my first Agilent data set, so I would be > >> > interested to hear opinions of others with more experience. Before I > >> > start to re-invent the wheel here, I?d be also interested to know > >> > whether any of you is aware of tools, software, papers, etc? dealing > >> > with the analysis of Agilent array data specifically (rather than just > >> > applying standard methods for 2-coloured cDNA -arrays). > >> > > >> > Any help/comments appreciated > >> > > >> > Claus > >> > > >> > -- > >> > > >> > ******************************************************************** *************** > >> > Claus-D. Mayer | http://www.bioss.ac.uk > >> > Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk > >> > Rowett Research Institute | Telephone: +44 (0) 1224 716652 > >> > Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349 > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor at stat.math.ethz.ch > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > > >> > > >> > >> > >>------------------------------------- > >>Wolfgang Huber > >>European Bioinformatics Institute > >>European Molecular Biology Laboratory > >>Cambridge CB10 1SD > >>England > >>Phone: +44 1223 494642 > >>Http: www.ebi.ac.uk/huber > >> > >>_______________________________________________ > >>Bioconductor mailing list > >>Bioconductor at stat.math.ethz.ch > >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > > > > > >------------------------------------- >Wolfgang Huber >European Bioinformatics Institute >European Molecular Biology Laboratory >Cambridge CB10 1SD >England >Phone: +44 1223 494642 >Http: www.ebi.ac.uk/huber

Normalization probe affy affydata limma marray BEAT Normalization probe affy affydata • 1.3k views

ADD COMMENT • link updated 18.8 years ago by Wolfgang Huber ★ 13k • written 18.8 years ago by Gordon Smyth 50k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 10 days ago

EMBL European Molecular Biology Laborat…

> Basically, you're saying that if the arrays are very high quality, you can > get away with an inefficient analysis. Gordon, I did not say that, it sounds stupid, please do not misquote people. > Naomi is refering to what I call the "intraspot" correlation, see for > example the intraspotCorrelation() function in the limma package, and it > is critically important. The correlation isn't a bad thing, nor is it > restricted to poor quality arrays. Rather it means that contrasts > estimated within a spot are highly accurate. I agree that contrasts estimated from within one array are more accurate than those from different arrays. Note that when I said "treat a two-color array like two single-color arrays", this was in the paragraph on how to normalize, not on differential expression. But apparently this still triggered off a few people ... Two aspects were raised by Claus' question that started this thread: how to normalize these data, and how to identify differentially expressed genes. My experience is that multi-channel normalization methods like vsn (or quantiles for that matter) work well for sets of mass-produced two-color arrays. Then, it is still better to look at contrasts within arrays. But it is at least possible (even if less accurate / precise) to look at contrasts across arrays by directly comparing the intensities, rather than always having to go through a chain of log-ratios. > Why not do it properly and get the full benefit of the high > quality arrays? My experience is that high quality > Agilent arrays can beat affy for accuracy if treated properly. Agreed. Do you think it's because of the two colors or of the longer (and hence more specific) probes ? Best wishes Wolfgang <quote who="Gordon Smyth"> > Wolfgang, > > Naomi is refering to what I call the "intraspot" correlation, see for > example the intraspotCorrelation() function in the limma package, and it > is > critically important. The correlation isn't a bad thing, nor is it > restricted to poor quality arrays. Rather it means that contrasts > estimated > within a spot are highly accurate. It is what makes the two-colour > technology intrinsically more accurate than one channel technology, other > things being equal. See http://www.statsci.org/smyth/pubs/ISI2005-116.pdf > for some discussion. > > Basically, you're saying that if the arrays are very high quality, you can > get away with an inefficient analysis. Why not do it properly and get the > full benefit of the high quality arrays? My experience is that high > quality > Agilent arrays can beat affy for accuracy if treated properly. > > Gordon > >>Date: Thu, 23 Jun 2005 15:29:38 +0100 (BST) >>From: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >>Subject: Re: [BioC] Agilent Arrays >>To: "Naomi Altman" <naomi at="" stat.psu.edu=""> >>Cc: bioconductor at stat.math.ethz.ch >> >>Hi Naomi, >> >>and why is that important? Also, what is the within gene correlation >>between green foreground of array 1 and green foreground of array 2? >> >>Bw >> Wolfgang >> >><quote who="Naomi Altman"> >> > I am working with Agilent arrays on which we have spotted many >> replicates >> > of the control spots. >> > The within gene correlation between red and green forground is about >> 0.8 >> > for the unnormalized data - i.e. pretty high! >> > >> > --Naomi >> > >> > At 03:23 AM 6/23/2005, Wolfgang Huber wrote: >> >>Hi Claus, >> >> >> >>for the normalization of arrays where the spotting etc. variability >> >>between chips is not strong, you can treat the data from m two- colour >> >>arrays as if it were 2*m single colour ones, and use methods like >> >>"quantiles" or "vsn". >> >> >> >>Note that for almost all genes, the hybridization is not limited by >> the >> >>amount of probe DNA, hence the competition between red and gree target >> is >> >>negligible for almost all genes (execept possibly the most highly >> >>expressed ones). This justifies treating a two-color array like two >> >>single-color arrays. >> >> >> >>Only later when you consider the contrasts of interest for finding >> >>differentially expressed genes, you want to make sure that these are >> not >> >>confounded with dye. >> >> >> >>PS, I think your question is very directly Bioconductor related! >> >> >> >>Best wishes >> >> Wolfgang >> >> >> >> >> >><quote who="Claus Mayer"> >> >> > Dear all! >> >> > >> >> > Apologies for asking a question which is not directly Bioconductor >> >> > related: After some experience with spotted 2-channel arrays and >> >> > Affydata, I am currently analysing my first data set based on >> Agilent >> >> > arrays. I know that packages like marray or limma have facilities >> to >> >> > read these data and that they can be normalised and analysed like >> any >> >> > other 2-colour-arrays. On the other hand the printing technology of >> >> > these arrays (using inkjet-printing of 60mer oligos) is closer in >> >> spirit >> >> > to Affy, if I understand this correctly. This seems to show in the >> >> data >> >> > as well. For example the strongest correlations I found in the >> single >> >> > channel (log-)intensities was not between the two channels observed >> on >> >> > the same slide (like with spotted arrays), but between the two >> >> channels >> >> > (differently dyed on different arrays in a loop design) that >> contained >> >> > the same sample (which is quite reassuring). This made me wonder >> >> whether >> >> > (once dye and array effects have been removed by some normalisation >> >> > method) with Agilent arrays one might really use single channel >> >> > intensities as measures of gene expression instead of reducing them >> to >> >> > the log-ratio only as is usually done for two-channel data. >> >> > >> >> > This would have consequences on the way these arrays should be >> >> > normalised (rather by a multichip method than individually) and >> also >> >> > allow more flexibility in the design of experiments. >> >> > >> >> > As I said before this is my first Agilent data set, so I would be >> >> > interested to hear opinions of others with more experience. Before >> I >> >> > start to re-invent the wheel here, I?d be also interested to know >> >> > whether any of you is aware of tools, software, papers, etc? >> dealing >> >> > with the analysis of Agilent array data specifically (rather than >> just >> >> > applying standard methods for 2-coloured cDNA -arrays). >> >> > >> >> > Any help/comments appreciated >> >> > >> >> > Claus >> >> > >> >> > -- >> >> > >> >> >> ******************************************************************* **************** >> >> > Claus-D. Mayer | http://www.bioss.ac.uk >> >> > Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk >> >> > Rowett Research Institute | Telephone: +44 (0) 1224 >> 716652 >> >> > Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349 >> >> > >> >> > _______________________________________________ >> >> > Bioconductor mailing list >> >> > Bioconductor at stat.math.ethz.ch >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > >> >> > >> >> >> >> >> >>------------------------------------- >> >>Wolfgang Huber >> >>European Bioinformatics Institute >> >>European Molecular Biology Laboratory >> >>Cambridge CB10 1SD >> >>England >> >>Phone: +44 1223 494642 >> >>Http: www.ebi.ac.uk/huber >> >> >> >>_______________________________________________ >> >>Bioconductor mailing list >> >>Bioconductor at stat.math.ethz.ch >> >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> > >> > Naomi S. Altman 814-865-3791 (voice) >> > Associate Professor >> > Bioinformatics Consulting Center >> > Dept. of Statistics 814-863-7114 (fax) >> > Penn State University 814-865-1348 >> (Statistics) >> > University Park, PA 16802-2111 >> > >> > >> > >> >> >>------------------------------------- >>Wolfgang Huber >>European Bioinformatics Institute >>European Molecular Biology Laboratory >>Cambridge CB10 1SD >>England >>Phone: +44 1223 494642 >>Http: www.ebi.ac.uk/huber > > ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Http: www.ebi.ac.uk/huber

ADD COMMENT • link 18.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

On Sat, June 25, 2005 12:36 am, Wolfgang Huber said: >> Basically, you're saying that if the arrays are very high quality, you can >> get away with an inefficient analysis. > > Gordon, I did not say that, it sounds stupid, please do not misquote > people. Actually I didn't quote you at all. The word "Basically" in this context is a signal that I am interpreting your comments and their consequences rather than quoting you. You can disagree with my interpretation or can argue that it is mistaken, as you do below, but being misquoted is quite a different thing! :) >> Naomi is refering to what I call the "intraspot" correlation, see for >> example the intraspotCorrelation() function in the limma package, and it >> is critically important. The correlation isn't a bad thing, nor is it >> restricted to poor quality arrays. Rather it means that contrasts >> estimated within a spot are highly accurate. > > I agree that contrasts estimated from within one array are more > accurate than those from different arrays. And in order to combine these two types of contrasts efficiency in an analysis, one needs to quantify the difference in accuracy. Hence the need to estimate the intraspot correlation. > Note that when I said > "treat a two-color array like two single-color arrays", this was in > the paragraph on how to normalize, not on differential expression. But > apparently this still triggered off a few people ... Part of the trouble is that you continued on in the next paragraph to consider differential expression, and you seemed (to me at least) to be implying that the same conclusions continue to apply with only one caveat. Thanks for the clarification. As you know, I personally prefer to take advantage of the two-colour technology even at the normalisation stage, but that's another matter. > Two aspects were raised by Claus' question that started this thread: > how to normalize these data, and how to identify differentially > expressed genes. My experience is that multi-channel normalization > methods like vsn (or quantiles for that matter) work well for sets of > mass-produced two-color arrays. Then, it is still better to look at > contrasts within arrays. But it is at least possible (even if less > accurate / precise) to look at contrasts across arrays by directly > comparing the intensities, rather than always having to go through a > chain of log-ratios. Claus' asked what is specific to Agilent. As I understand it, all your comments here apply to any type of two-colour array. Did you intend to say something specific about Agilent arrays or am I still mis-understanding what you mean? >> Why not do it properly and get the full benefit of the high >> quality arrays? My experience is that high quality >> Agilent arrays can beat affy for accuracy if treated properly. > > Agreed. Do you think it's because of the two colors or of the longer > (and hence more specific) probes ? Well, Affy actually has more nucleotides per gene than Agilent when one takes into account the multiple probes per probe set. I don't want to speculate too much on the reasons, but the fact that Agilent can reliably lay down 80mers rather than 25mers strongly suggests that the deposition process is more accurate. The two colours are certainly important. Calculations in our lab suggest that one typically loses around 70% of information in a two colour experiment by going from direct to indirect comparisons, and 80-90% when going to single channel comparisons across different arrays without taking the intraspot correlations into account. So Agilent may be well behind Affy if not treated optimally. Gordon > Best wishes > Wolfgang > > <quote who="Gordon Smyth"> >> Wolfgang, >> >> Naomi is refering to what I call the "intraspot" correlation, see for >> example the intraspotCorrelation() function in the limma package, and it >> is >> critically important. The correlation isn't a bad thing, nor is it >> restricted to poor quality arrays. Rather it means that contrasts >> estimated >> within a spot are highly accurate. It is what makes the two-colour >> technology intrinsically more accurate than one channel technology, other >> things being equal. See http://www.statsci.org/smyth/pubs/ISI2005-116.pdf >> for some discussion. >> >> Basically, you're saying that if the arrays are very high quality, you can >> get away with an inefficient analysis. Why not do it properly and get the >> full benefit of the high quality arrays? My experience is that high >> quality >> Agilent arrays can beat affy for accuracy if treated properly. >> >> Gordon >> >>>Date: Thu, 23 Jun 2005 15:29:38 +0100 (BST) >>>From: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >>>Subject: Re: [BioC] Agilent Arrays >>>To: "Naomi Altman" <naomi at="" stat.psu.edu=""> >>>Cc: bioconductor at stat.math.ethz.ch >>> >>>Hi Naomi, >>> >>>and why is that important? Also, what is the within gene correlation >>>between green foreground of array 1 and green foreground of array 2? >>> >>>Bw >>> Wolfgang >>> >>><quote who="Naomi Altman"> >>> > I am working with Agilent arrays on which we have spotted many >>> replicates >>> > of the control spots. >>> > The within gene correlation between red and green forground is about >>> 0.8 >>> > for the unnormalized data - i.e. pretty high! >>> > >>> > --Naomi >>> > >>> > At 03:23 AM 6/23/2005, Wolfgang Huber wrote: >>> >>Hi Claus, >>> >> >>> >>for the normalization of arrays where the spotting etc. variability >>> >>between chips is not strong, you can treat the data from m two- colour >>> >>arrays as if it were 2*m single colour ones, and use methods like >>> >>"quantiles" or "vsn". >>> >> >>> >>Note that for almost all genes, the hybridization is not limited by >>> the >>> >>amount of probe DNA, hence the competition between red and gree target >>> is >>> >>negligible for almost all genes (execept possibly the most highly >>> >>expressed ones). This justifies treating a two-color array like two >>> >>single-color arrays. >>> >> >>> >>Only later when you consider the contrasts of interest for finding >>> >>differentially expressed genes, you want to make sure that these are >>> not >>> >>confounded with dye. >>> >> >>> >>PS, I think your question is very directly Bioconductor related! >>> >> >>> >>Best wishes >>> >> Wolfgang >>> >> >>> >> >>> >><quote who="Claus Mayer"> >>> >> > Dear all! >>> >> > >>> >> > Apologies for asking a question which is not directly Bioconductor >>> >> > related: After some experience with spotted 2-channel arrays and >>> >> > Affydata, I am currently analysing my first data set based on >>> Agilent >>> >> > arrays. I know that packages like marray or limma have facilities >>> to >>> >> > read these data and that they can be normalised and analysed like >>> any >>> >> > other 2-colour-arrays. On the other hand the printing technology of >>> >> > these arrays (using inkjet-printing of 60mer oligos) is closer in >>> >> spirit >>> >> > to Affy, if I understand this correctly. This seems to show in the >>> >> data >>> >> > as well. For example the strongest correlations I found in the >>> single >>> >> > channel (log-)intensities was not between the two channels observed >>> on >>> >> > the same slide (like with spotted arrays), but between the two >>> >> channels >>> >> > (differently dyed on different arrays in a loop design) that >>> contained >>> >> > the same sample (which is quite reassuring). This made me wonder >>> >> whether >>> >> > (once dye and array effects have been removed by some normalisation >>> >> > method) with Agilent arrays one might really use single channel >>> >> > intensities as measures of gene expression instead of reducing them >>> to >>> >> > the log-ratio only as is usually done for two-channel data. >>> >> > >>> >> > This would have consequences on the way these arrays should be >>> >> > normalised (rather by a multichip method than individually) and >>> also >>> >> > allow more flexibility in the design of experiments. >>> >> > >>> >> > As I said before this is my first Agilent data set, so I would be >>> >> > interested to hear opinions of others with more experience. Before >>> I >>> >> > start to re-invent the wheel here, I?d be also interested to know >>> >> > whether any of you is aware of tools, software, papers, etc? >>> dealing >>> >> > with the analysis of Agilent array data specifically (rather than >>> just >>> >> > applying standard methods for 2-coloured cDNA -arrays). >>> >> > >>> >> > Any help/comments appreciated >>> >> > >>> >> > Claus >>> >> > >>> >> > -- >>> >> > >>> >> >>> ****************************************************************** ***************** >>> >> > Claus-D. Mayer | http://www.bioss.ac.uk >>> >> > Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk >>> >> > Rowett Research Institute | Telephone: +44 (0) 1224 >>> 716652 >>> >> > Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349 >>> >> > >>> >> > _______________________________________________ >>> >> > Bioconductor mailing list >>> >> > Bioconductor at stat.math.ethz.ch >>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> > >>> >> > >>> >> >>> >> >>> >>------------------------------------- >>> >>Wolfgang Huber >>> >>European Bioinformatics Institute >>> >>European Molecular Biology Laboratory >>> >>Cambridge CB10 1SD >>> >>England >>> >>Phone: +44 1223 494642 >>> >>Http: www.ebi.ac.uk/huber >>> >> >>> >>_______________________________________________ >>> >>Bioconductor mailing list >>> >>Bioconductor at stat.math.ethz.ch >>> >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > >>> > Naomi S. Altman 814-865-3791 (voice) >>> > Associate Professor >>> > Bioinformatics Consulting Center >>> > Dept. of Statistics 814-863-7114 (fax) >>> > Penn State University 814-865-1348 >>> (Statistics) >>> > University Park, PA 16802-2111

ADD REPLY • link 18.8 years ago Gordon Smyth 50k

Login before adding your answer.