normalisation assumptions (violation of)

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 8.8 years ago

United Kingdom

Hi, I have a set of data from an experiment where there appears to be an effect of the treatment on a large number of genes. I put scatterplots for 6 of the slides here: http://mcnach.com/MISC/scatterplots.gif these are Cy3 vs Cy5, in log scale. These show that many genes are differentially expressed, and they are mostly one one side only (upregulated; some of those slides are dye swaps). Would this appear to violate (too much) any of the assumptions made by loess normalisation? Should I investigate other normalisation procedures? Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

• 1.4k views

ADD COMMENT • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 4 months ago

United States

On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > > Hi, > > I have a set of data from an experiment where there appears to be an > effect of the treatment on a large number of genes. I put scatterplots > for 6 of the slides here: > > http://mcnach.com/MISC/scatterplots.gif > > these are Cy3 vs Cy5, in log scale. > > These show that many genes are differentially expressed, and they are > mostly one one side only (upregulated; some of those slides are dye > swaps). > > Would this appear to violate (too much) any of the assumptions made by > loess normalisation? Should I investigate other normalisation > procedures? First, I would start by doing a VERY thorough evalutation of the slide quality for these slides, as these are very distorted scatterplots. IF the slide quality looks OK, then I would probably stay away from a non- linear normalization method, as these will tend to make your differentially-expressed genes look less differentially-expressed. Sean

ADD COMMENT • link 17.8 years ago Sean Davis 21k

0

Entering edit mode

Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: > > > > On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > >> >> Hi, >> >> I have a set of data from an experiment where there appears to be an >> effect of the treatment on a large number of genes. I put scatterplots >> for 6 of the slides here: >> >> http://mcnach.com/MISC/scatterplots.gif >> >> these are Cy3 vs Cy5, in log scale. >> >> These show that many genes are differentially expressed, and they are >> mostly one one side only (upregulated; some of those slides are dye >> swaps). >> >> Would this appear to violate (too much) any of the assumptions made by >> loess normalisation? Should I investigate other normalisation >> procedures? > > First, I would start by doing a VERY thorough evalutation of the slide > quality for these slides, as these are very distorted scatterplots. IF the > slide quality looks OK, then I would probably stay away from a non- linear > normalization method, as these will tend to make your > differentially-expressed genes look less differentially-expressed. > > Sean Hi Sean, thanks for your reply. The slides are good, I checked them well. The strong effect is not so unexpected, as it involves transfection of cells with a DNA-binding protein fused to a strong transactivator, so in theory the fusion protein could be responsible of the expression of a very large number of genes. There is some specificity to the binding, but there should be many target sites, often at promoters... So the effects are more or less what we expected, I suppose, and the quality of the slides is good. The second spike going either almost vertical or almost horizontal should correspond to those genes that are not expressed on the particular cell line, but expressed after transfection. Do you have any suggestions of what sort of methods to use, for the normalisation of such experiments? Until now I used loess for everything, but I wasn't sure it would be okay for this experiment when I saw these plots. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

On 8/7/06 7:29 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: > >> >> >> >> On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: >> >>> >>> Hi, >>> >>> I have a set of data from an experiment where there appears to be an >>> effect of the treatment on a large number of genes. I put scatterplots >>> for 6 of the slides here: >>> >>> http://mcnach.com/MISC/scatterplots.gif >>> >>> these are Cy3 vs Cy5, in log scale. >>> >>> These show that many genes are differentially expressed, and they are >>> mostly one one side only (upregulated; some of those slides are dye >>> swaps). >>> >>> Would this appear to violate (too much) any of the assumptions made by >>> loess normalisation? Should I investigate other normalisation >>> procedures? >> >> First, I would start by doing a VERY thorough evalutation of the slide >> quality for these slides, as these are very distorted scatterplots. IF the >> slide quality looks OK, then I would probably stay away from a non- linear >> normalization method, as these will tend to make your >> differentially-expressed genes look less differentially-expressed. >> >> Sean > > Hi Sean, > > thanks for your reply. The slides are good, I checked them well. The > strong effect is not so unexpected, as it involves transfection of > cells with a DNA-binding protein fused to a strong transactivator, so > in theory the fusion protein could be responsible of the expression of > a very large number of genes. There is some specificity to the binding, > but there should be many target sites, often at promoters... So the > effects are more or less what we expected, I suppose, and the quality > of the slides is good. The second spike going either almost vertical or > almost horizontal should correspond to those genes that are not > expressed on the particular cell line, but expressed after transfection. > > Do you have any suggestions of what sort of methods to use, for the > normalisation of such experiments? Until now I used loess for > everything, but I wasn't sure it would be okay for this experiment when > I saw these plots. You can certainly try loess and see how the result looks, as scatterplots are notorious for "hiding" where the data are most dense. Alternatively, you could try "rotating" the scatterplot until the body of the data is where you think it should be--I don't know if there is a method in Bioconductor that does this, though. Sean

ADD REPLY • link 17.8 years ago Sean Davis 21k

0

Entering edit mode

Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: [...] > You can certainly try loess and see how the result looks, as scatterplots > are notorious for "hiding" where the data are most dense. Alternatively, > you could try "rotating" the scatterplot until the body of the data is where > you think it should be--I don't know if there is a method in Bioconductor > that does this, though. > > Sean Thanks Sean. I already tried loess, and this is the MA plot for the first set of data looks like this: http://mcnach.com/MISC/MAplots2.png which looks okay to me. You see the ascending diagonal is denser, which contains all those newly activated spots. I knew a few genes that were expected to be there (from RT data) and they line up nicely on that diagonal. This was without substracting background. When I attempted to correct for background I run into problems. Mainly because some slides have a higher bkg than usual, and the signal is lower than the local bkg for a good number of spots. When I use "subtract" as a bkg correction method, it results in many negative intensities, and those spots are removed. I then tried "half" to overcome this, so that negative values are turned into an arbitrary 0.5... and this totally flattened the MA plot, and nothing was statistically DE. I showed this on a previous thread: http://mcnach.com/MISC/MAplots1.png It's very striking. It leaves me no other choice but not removing background (which is increasingly looking like the best option in general, in my still short experience...) Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

On 8/7/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: > > [...] > > You can certainly try loess and see how the result looks, as scatterplots > > are notorious for "hiding" where the data are most dense. Alternatively, > > you could try "rotating" the scatterplot until the body of the data is where > > you think it should be--I don't know if there is a method in Bioconductor > > that does this, though. > > > > Sean > > Thanks Sean. > > I already tried loess, and this is the MA plot for the first set of > data looks like this: > > http://mcnach.com/MISC/MAplots2.png > > which looks okay to me. You see the ascending diagonal is denser, which > contains all those newly activated spots. I knew a few genes that were > expected to be there (from RT data) and they line up nicely on that > diagonal. This MA plot indicates that the noise levels have become assymetric after curve-fit normalization. I say so, because your data is "bending" upwards instead of being a nice flat line, cf. Frame 33 of 48 in http://www.maths.lth.se/bioinformatics/calendar/20051108/. If this is true, your tests down the stream might not work that well. > > This was without substracting background. > When I attempted to correct for background I run into problems. Mainly > because some slides have a higher bkg than usual, and the signal is > lower than the local bkg for a good number of spots. When I use You haven't told us your platform. What type of scanner do you use? > "subtract" as a bkg correction method, it results in many negative > intensities, and those spots are removed. I then tried "half" to I would say that this is expected for signals around zero (on the intensity scale); if you have no biological signals it is a 50-50 chance if the background is stronger than the foreground. The problem is how to deal with those. Also, do NOT be afraid of the large noise levels at lower intensities; you do expect to see these when your signals get closer to noise levels (closer to zero). If you want to stabalize the variance structure there are methods for this, but then you pay the price of loosing accuracy (you get biased log-ratio estimates). > overcome this, so that negative values are turned into an arbitrary > 0.5... and this totally flattened the MA plot, and nothing was Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005? You might want to look into Kooperberg's background correction methods, or the ones in limma. > statistically DE. I showed this on a previous thread: > > http://mcnach.com/MISC/MAplots1.png > > It's very striking. It leaves me no other choice but not removing > background (which is increasingly looking like the best option in > general, in my still short experience...) You haven't told us your platform. What scanner do you have? You might have an offset in your scanner (quite commonly added to avoid that analogue negative signals are truncated to zero), e.g. Axon and Agilent introduce about 20-25 units (which is significant). With a simple scan protocol it is easy to check if your scanner introduce offset. The method is described in H. Bengtsson, G. J?nsson and J. Vallon-Christersson, Calibration and assessment of channel-specific biases in microarray data with extended dynamical range, BMC Bioinformatics, 2004, 5:177. and the estimatation and calibration methods are in aroma.light. The scanner offset is a global constant which means that you only fit a single parameter per channel. That is, subtracting this "background" from the foreground signals does not introduce as much noise as if you would subtract the image-analysis estimated backgrounds unique to each spot. This will leave you with less (probably no) non-positive signals. It might also be enough to remove the curvature seen in your raw MA plots. If so, your remaining problem will be how to estimate the overall relative scale factor between the two channels, which is only one parameter; it should be easier than using non-parametric curve-fit methods. I would also like to encourage you to read up on what affine transformations (offset plus rescaling) can do to your data and especially your MA plots; H. Bengtsson and O. H?ssjer, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method, BMC Bioinformatics, 2006, 7:100. When you understand the bits and pieces of what's going on there you will also be much more careful when you pick your normalization method. If would say that curve-fit (loess, lowess, spline, ...) normalization is often overkill and corrects for a symptome rather than fixing the underlying problem. Quantile normalization can be interpreted as a non-parametric method that corrects for affine transformations, but it has a problem at the lower and higher intensities. Variance stabilization methods (Rocke & Durbin, W Huber) have an explicit affine component in there models so they are much more suited to this type of transform. Plain affine normalization (aroma.light) corrects for affine transformation without controlling for variance (on purpose). The estimatation methods also differ between the latter two approaches. I hope this is a good start. Cheers Henrik > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 17.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Quoting Henrik Bengtsson <hb at="" maths.lth.se="">: > You haven't told us your platform. What type of scanner do you use? GenePix 4200AL. >> overcome this, so that negative values are turned into an arbitrary >> 0.5... and this totally flattened the MA plot, and nothing was > > Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005? > You might want to look into Kooperberg's background correction > methods, or the ones in limma. actually, I tried other numbers too, just to check that they did not have a drastic effect on the final results. I just wanted a positive number (actually >1 better, so that I can take logs directly) that is low enough so that I get a high M value when I divide the signal of teh other channel by it. M values of genes that have no detectable signal on one channel are meaningless, in that they don't represent any kind of fold enrichment... but they're useful to help me pick those genes. > You haven't told us your platform. What scanner do you have? You > might have an offset in your scanner (quite commonly added to avoid > that analogue negative signals are truncated to zero), e.g. Axon and > Agilent introduce about 20-25 units (which is significant). With a > simple scan protocol it is easy to check if your scanner introduce > offset. The method is described in > > H. Bengtsson, G. J?nsson and J. Vallon-Christersson, Calibration and > assessment of channel-specific biases in microarray data with extended > dynamical range, BMC Bioinformatics, 2004, 5:177. > > and the estimatation and calibration methods are in aroma.light. The > scanner offset is a global constant which means that you only fit a > single parameter per channel. That is, subtracting this "background" > from the foreground signals does not introduce as much noise as if you > would subtract the image-analysis estimated backgrounds unique to each > spot. This will leave you with less (probably no) non-positive > signals. It might also be enough to remove the curvature seen in your > raw MA plots. If so, your remaining problem will be how to estimate > the overall relative scale factor between the two channels, which is > only one parameter; it should be easier than using non-parametric > curve-fit methods. I would like to try your package aroma. I've been meaning to for a while. I like your reasoning. But unfortunately my "exploring" time is limited. You probably think that it will be a good investment of time to dedicate some time now to explore these issues more in depth... and I would agree... but unfortunately I am not able. It's not entirely my call... The problem I had with negative signals is enhanced in this particular experiment because I happened to have a few slides with abnormally high background, mainly on the Cy3 channel. The high background was due to a problem in the preparation of teh samples. Usually I get pretty clean slides. I'm working on repeating the "bad" slides to help solve this. > When you understand the bits and pieces of what's going on there you > will also be much more careful when you pick your normalization > method. If would say that curve-fit (loess, lowess, spline, ...) > normalization is often overkill and corrects for a symptome rather > than fixing the underlying problem. Quantile normalization can be > interpreted as a non-parametric method that corrects for affine > transformations, but it has a problem at the lower and higher > intensities. Variance stabilization methods (Rocke & Durbin, W Huber) > have an explicit affine component in there models so they are much > more suited to this type of transform. Plain affine normalization > (aroma.light) corrects for affine transformation without controlling > for variance (on purpose). The estimatation methods also differ > between the latter two approaches. > > I hope this is a good start. As ever, your replies are very useful. I just wished I had a little help so that I could spend more time looking at these details in a lot more depth. But I will do what I can, and the replies received so far are all very useful for me. Thanks! Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

On 8/8/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting Henrik Bengtsson <hb at="" maths.lth.se="">: > > > You haven't told us your platform. What type of scanner do you use? > > GenePix 4200AL. I have no feedback on this specific model, but I'm keen to hear about your findings. /Henrik > > >> overcome this, so that negative values are turned into an arbitrary > >> 0.5... and this totally flattened the MA plot, and nothing was > > > > Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005? > > You might want to look into Kooperberg's background correction > > methods, or the ones in limma. > > actually, I tried other numbers too, just to check that they did not > have a drastic effect on the final results. I just wanted a positive > number (actually >1 better, so that I can take logs directly) that is > low enough so that I get a high M value when I divide the signal of teh > other channel by it. M values of genes that have no detectable signal > on one channel are meaningless, in that they don't represent any kind > of fold enrichment... but they're useful to help me pick those genes. > > > > You haven't told us your platform. What scanner do you have? You > > might have an offset in your scanner (quite commonly added to avoid > > that analogue negative signals are truncated to zero), e.g. Axon and > > Agilent introduce about 20-25 units (which is significant). With a > > simple scan protocol it is easy to check if your scanner introduce > > offset. The method is described in > > > > H. Bengtsson, G. J?nsson and J. Vallon-Christersson, Calibration and > > assessment of channel-specific biases in microarray data with extended > > dynamical range, BMC Bioinformatics, 2004, 5:177. > > > > and the estimatation and calibration methods are in aroma.light. The > > scanner offset is a global constant which means that you only fit a > > single parameter per channel. That is, subtracting this "background" > > from the foreground signals does not introduce as much noise as if you > > would subtract the image-analysis estimated backgrounds unique to each > > spot. This will leave you with less (probably no) non-positive > > signals. It might also be enough to remove the curvature seen in your > > raw MA plots. If so, your remaining problem will be how to estimate > > the overall relative scale factor between the two channels, which is > > only one parameter; it should be easier than using non-parametric > > curve-fit methods. > > I would like to try your package aroma. I've been meaning to for a > while. I like your reasoning. But unfortunately my "exploring" time is > limited. You probably think that it will be a good investment of time > to dedicate some time now to explore these issues more in depth... and > I would agree... but unfortunately I am not able. It's not entirely my > call... > > The problem I had with negative signals is enhanced in this particular > experiment because I happened to have a few slides with abnormally high > background, mainly on the Cy3 channel. The high background was due to a > problem in the preparation of teh samples. Usually I get pretty clean > slides. I'm working on repeating the "bad" slides to help solve this. > > > When you understand the bits and pieces of what's going on there you > > will also be much more careful when you pick your normalization > > method. If would say that curve-fit (loess, lowess, spline, ...) > > normalization is often overkill and corrects for a symptome rather > > than fixing the underlying problem. Quantile normalization can be > > interpreted as a non-parametric method that corrects for affine > > transformations, but it has a problem at the lower and higher > > intensities. Variance stabilization methods (Rocke & Durbin, W Huber) > > have an explicit affine component in there models so they are much > > more suited to this type of transform. Plain affine normalization > > (aroma.light) corrects for affine transformation without controlling > > for variance (on purpose). The estimatation methods also differ > > between the latter two approaches. > > > > I hope this is a good start. > > As ever, your replies are very useful. I just wished I had a little > help so that I could spend more time looking at these details in a lot > more depth. But I will do what I can, and the replies received so far > are all very useful for me. > > Thanks! > > Jose > > > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > >

ADD REPLY • link 17.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi Jose, I think you should correct for background since as you have commented you have slides with high background intensity and you want to remove background biass. I dont know if you have already tried "normexp". Anycase and talking about the normalization process I think you dont should be so worry about the violation of the number of genes DE in your normalization process. I have been working with similar experiment that you mentioned using print-tip-loess and the results were prety good. It is true that the normalization process is basesd in some assumptions. But not single microarray experimen fullfil these assumptions... HTH Manuel --- J.delasHeras at ed.ac.uk escribi?: > Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: > > [...] > > You can certainly try loess and see how the result > looks, as scatterplots > > are notorious for "hiding" where the data are most > dense. Alternatively, > > you could try "rotating" the scatterplot until the > body of the data is where > > you think it should be--I don't know if there is a > method in Bioconductor > > that does this, though. > > > > Sean > > Thanks Sean. > > I already tried loess, and this is the MA plot for > the first set of > data looks like this: > > http://mcnach.com/MISC/MAplots2.png > > which looks okay to me. You see the ascending > diagonal is denser, which > contains all those newly activated spots. I knew a > few genes that were > expected to be there (from RT data) and they line up > nicely on that > diagonal. > > This was without substracting background. > When I attempted to correct for background I run > into problems. Mainly > because some slides have a higher bkg than usual, > and the signal is > lower than the local bkg for a good number of spots. > When I use > "subtract" as a bkg correction method, it results in > many negative > intensities, and those spots are removed. I then > tried "half" to > overcome this, so that negative values are turned > into an arbitrary > 0.5... and this totally flattened the MA plot, and > nothing was > statistically DE. I showed this on a previous > thread: > > http://mcnach.com/MISC/MAplots1.png > > It's very striking. It leaves me no other choice but > not removing > background (which is increasingly looking like the > best option in > general, in my still short experience...) > > Jose > > -- > Dr. Jose I. de las Heras Email: > J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: > +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: > +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 17.8 years ago M PEREZ ▴ 110

0

Entering edit mode

Quoting M Perez <perezperezmm at="" yahoo.es="">: > Hi Jose, > > I think you should correct for background since as you > have commented you have slides with high background > intensity and you want to remove background biass. I > dont know if you have already tried "normexp". Hi Manuel, I haven't really. I did a long time ago and what put me off was having to search for the right offset, when I was hoping for something a bit more "automatic" (and at the time I used LimmaGUI, which is a bit more tedious if you want to experiment a little). I should try that. However, I notice that the background usually appears to have little or nothing to do with the signals measured. The background tends to be very uniform across the slide, and the fact that I get "negative spots" where you see less signal on the actual spot than around it, makes me think that the cDNA spotted acts as a pretty good block against that general background. In other words, I am not convinced that the background measured on the glass has much to do with the signal I measured on a spot of DNA, and substracting background may be actually a bad thing to do. Another reason I think background substraction doesn't matter much, is that on the occasions when I do see some pattern on the background (using 'imageplot' for instance, you can tune the ranges to display to enhance and view those patterns), it often doesn't translate on a pattern when you display the red/green ratios, or the signals on their own. Not always, but quite often, from what I've seen. And when you do get some scratches that affect clearly the signal measured, it might make more sense to flag those spots... or to simply rely on the fact that there should be enough replicates, so an odd measurement should not affect the outcome too much (hopefully if on another slide I have another scratch it will not affect the very same spots again :-) I think I like Henrik Bengtsson's idea about measuring the background inherent to a particular scanner, and substract that instead... but I haven't yet explored that properly (hangs head in shame)... the probelm with being a one-man operation is that you're pressed to get results that are "good enough" to continue the biology, rather than spending too much time in working out what's teh best way to get the most of the data available. If only I could clone myself... but then I wouldn't like to work with myself... ;-) Right now I am exploring another avenue: repeating those experiments that gave me high background with view to remove the offending slides and use something of better quality. In this case it's relatively simple, but many tiimes I will not have the luxury, therefore I still want to understand the problem with background better. > Anycase and talking about the normalization process I > think you dont should be so worry about the violation > of the number of genes DE in your normalization > process. I have been working with similar experiment > that you mentioned using print-tip-loess and the > results were prety good. I'm glad to hear that. I had similar comments from other sources, and I must admit that the (very) few controls I had in my experiment seem to behave properly if apply print-tip-loess (and no bkg correction, because when I do I run into problems, as I mentioned in another thread) > It is true that the normalization process is basesd in > some assumptions. But not single microarray experimen > fullfil these assumptions... > HTH > Manuel I am aware that loess is pretty robust... I just wasn't sure that it was robust enough in an experiment such as this, where I expect the average median of ratios to be above 1 (although not by much, admittedly). Thanks for all the comments. I will definitely explore the normexp bkg correction method. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

On 8/8/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting M Perez <perezperezmm at="" yahoo.es="">: > > > Hi Jose, > > > > I think you should correct for background since as you > > have commented you have slides with high background > > intensity and you want to remove background biass. I > > dont know if you have already tried "normexp". > > Hi Manuel, > > I haven't really. I did a long time ago and what put me off was having > to search for the right offset, when I was hoping for something a bit > more "automatic" (and at the time I used LimmaGUI, which is a bit more > tedious if you want to experiment a little). I should try that. > However, I notice that the background usually appears to have little or > nothing to do with the signals measured. The background tends to be > very uniform across the slide, and the fact that I get "negative spots" > where you see less signal on the actual spot than around it, makes me > think that the cDNA spotted acts as a pretty good block against that > general background. In other words, I am not convinced that the > background measured on the glass has much to do with the signal I > measured on a spot of DNA, and substracting background may be actually > a bad thing to do. That is a very good statement. We have to ask ourselves what kind of "background" there is, not just define background from what methods we have available! For instance, it is possible to prove scientifically that the scanner introduce an offset. It might simply be that the image-analysis based background estimators happen to get close to the scanner background; that does not mean that the detected signal in the proximity of a spot is added to the spot, it just happens to be a good proxy to get to the scanner offset. That is just a hypothesis and in general I think that image-background signals are poor and noisy estimators of the scanner offset. > Another reason I think background substraction doesn't matter much, is > that on the occasions when I do see some pattern on the background > (using 'imageplot' for instance, you can tune the ranges to display to > enhance and view those patterns), it often doesn't translate on a > pattern when you display the red/green ratios, or the signals on their > own. Not always, but quite often, from what I've seen. And when you do > get some scratches that affect clearly the signal measured, it might > make more sense to flag those spots... or to simply rely on the fact > that there should be enough replicates, so an odd measurement should > not affect the outcome too much (hopefully if on another slide I have > another scratch it will not affect the very same spots again :-) Agree. > I think I like Henrik Bengtsson's idea about measuring the background > inherent to a particular scanner, and substract that instead... but I > haven't yet explored that properly (hangs head in shame)... the probelm > with being a one-man operation is that you're pressed to get results > that are "good enough" to continue the biology, rather than spending > too much time in working out what's teh best way to get the most of the > data available. If only I could clone myself... but then I wouldn't > like to work with myself... ;-) > > Right now I am exploring another avenue: repeating those experiments > that gave me high background with view to remove the offending slides > and use something of better quality. In this case it's relatively > simple, but many tiimes I will not have the luxury, therefore I still > want to understand the problem with background better. Seriously, it is very easy to do scanner calibration. Much easier that repeating experiments. Also, if the scanner offset is stable over time, which I suspect it is, you might only have to do this once every now and then, and simply just reuse the same estimate across arrays. Scan the same array at say four different PMTs, e.g. 800V, 700V, 600V and 500V. Keep the array in the scanner between scans to keep everything but the PMT as similar as possible. That way you can reuse the spot mask identified by Axon GenePix Pro on the 800V for the other images too. You'll get four GPR files. Pull out the foreground signals for one channel at the time from each of them as a vector, e.g. X800, X700, X600, X500, and put them in a matrix X <- matrix(c(X800, X700, X600, X500), ncol=4) Then estimate and calibrate the signals; library(aroma.light) Xc <- calibrateMultiscan(X) 'Xc' will be a singel vector or length nrow(X). The attribute 'modelFit' will contain the parameter estimates for that channel, i.e. the scanner offset etc. The scanner offset is in 'adiag', that is scannerOffset <- attr(Xc, "modelFit")$adiag Do the same for the other channel(s). Single-channel users are done here. FYI: The 'aroma.light' package provides a matrix-only interface to calibration/normalization methods. If have higher-order interfaces in 'aroma' off-Bioconductor, but the above should be enough. When there is time (?!?) I'll also provide wrappers to the 'exprSet' class. /Henrik > > > Anycase and talking about the normalization process I > > think you dont should be so worry about the violation > > of the number of genes DE in your normalization > > process. I have been working with similar experiment > > that you mentioned using print-tip-loess and the > > results were prety good. > > I'm glad to hear that. I had similar comments from other sources, and I > must admit that the (very) few controls I had in my experiment seem to > behave properly if apply print-tip-loess (and no bkg correction, > because when I do I run into problems, as I mentioned in another thread) > > > > It is true that the normalization process is basesd in > > some assumptions. But not single microarray experimen > > fullfil these assumptions... > > HTH > > Manuel > > I am aware that loess is pretty robust... I just wasn't sure that it > was robust enough in an experiment such as this, where I expect the > average median of ratios to be above 1 (although not by much, > admittedly). > > Thanks for all the comments. I will definitely explore the normexp bkg > correction method. > > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 17.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

I forgot to add one thing: On 8/8/06, Henrik Bengtsson <hb at="" stat.berkeley.edu=""> wrote: > On 8/8/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote: > > Quoting M Perez <perezperezmm at="" yahoo.es="">: > > > > > Hi Jose, > > > > > > I think you should correct for background since as you > > > have commented you have slides with high background > > > intensity and you want to remove background biass. I > > > dont know if you have already tried "normexp". > > > > Hi Manuel, > > > > I haven't really. I did a long time ago and what put me off was having > > to search for the right offset, when I was hoping for something a bit > > more "automatic" (and at the time I used LimmaGUI, which is a bit more > > tedious if you want to experiment a little). I should try that. > > However, I notice that the background usually appears to have little or > > nothing to do with the signals measured. The background tends to be > > very uniform across the slide, and the fact that I get "negative spots" > > where you see less signal on the actual spot than around it, makes me > > think that the cDNA spotted acts as a pretty good block against that > > general background. In other words, I am not convinced that the > > background measured on the glass has much to do with the signal I > > measured on a spot of DNA, and substracting background may be actually > > a bad thing to do. > > That is a very good statement. We have to ask ourselves what kind of > "background" there is, not just define background from what methods we > have available! For instance, it is possible to prove scientifically > that the scanner introduce an offset. It might simply be that the > image-analysis based background estimators happen to get close to the > scanner background; that does not mean that the detected signal in the > proximity of a spot is added to the spot, it just happens to be a good > proxy to get to the scanner offset. That is just a hypothesis and in > general I think that image-background signals are poor and noisy > estimators of the scanner offset. > > > Another reason I think background substraction doesn't matter much, is > > that on the occasions when I do see some pattern on the background > > (using 'imageplot' for instance, you can tune the ranges to display to > > enhance and view those patterns), it often doesn't translate on a > > pattern when you display the red/green ratios, or the signals on their > > own. Not always, but quite often, from what I've seen. And when you do > > get some scratches that affect clearly the signal measured, it might > > make more sense to flag those spots... or to simply rely on the fact > > that there should be enough replicates, so an odd measurement should > > not affect the outcome too much (hopefully if on another slide I have > > another scratch it will not affect the very same spots again :-) > > Agree. > > > I think I like Henrik Bengtsson's idea about measuring the background > > inherent to a particular scanner, and substract that instead... but I > > haven't yet explored that properly (hangs head in shame)... the probelm > > with being a one-man operation is that you're pressed to get results > > that are "good enough" to continue the biology, rather than spending > > too much time in working out what's teh best way to get the most of the > > data available. If only I could clone myself... but then I wouldn't > > like to work with myself... ;-) > > > > Right now I am exploring another avenue: repeating those experiments > > that gave me high background with view to remove the offending slides > > and use something of better quality. In this case it's relatively > > simple, but many tiimes I will not have the luxury, therefore I still > > want to understand the problem with background better. > > Seriously, it is very easy to do scanner calibration. Much easier > that repeating experiments. Also, if the scanner offset is stable > over time, which I suspect it is, you might only have to do this once > every now and then, and simply just reuse the same estimate across > arrays. > > Scan the same array at say four different PMTs, e.g. 800V, 700V, 600V > and 500V. Keep the array in the scanner between scans to keep > everything but the PMT as similar as possible. That way you can reuse > the spot mask identified by Axon GenePix Pro on the 800V for the other > images too. You'll get four GPR files. Pull out the foreground > signals for one channel at the time from each of them as a vector, > e.g. X800, X700, X600, X500, and put them in a matrix > > X <- matrix(c(X800, X700, X600, X500), ncol=4) Already here you can see if you've got scanner offset or not. Plot you data pairwise and zoom in at (0,0) and see if the datapoints from the different pairs converge at (0,0) or not; par(pch=19) plot(NA, xlim=c(0,700), ylim=c(0,700), col=(col <- 1)) abline(a=0,b=1) for (ii in 1:3) for (jj in (ii+1):4) points(X[,c(ii,jj)], col=(col <- col + 1)) See attached image for example. /Henrik > > Then estimate and calibrate the signals; > > library(aroma.light) > Xc <- calibrateMultiscan(X) > > 'Xc' will be a singel vector or length nrow(X). The attribute > 'modelFit' will contain the parameter estimates for that channel, i.e. > the scanner offset etc. The scanner offset is in 'adiag', that is > > scannerOffset <- attr(Xc, "modelFit")$adiag > > Do the same for the other channel(s). Single-channel users are done here. > > FYI: The 'aroma.light' package provides a matrix-only interface to > calibration/normalization methods. If have higher-order interfaces in > 'aroma' off-Bioconductor, but the above should be enough. When there > is time (?!?) I'll also provide wrappers to the 'exprSet' class. > > /Henrik > > > > > > Anycase and talking about the normalization process I > > > think you dont should be so worry about the violation > > > of the number of genes DE in your normalization > > > process. I have been working with similar experiment > > > that you mentioned using print-tip-loess and the > > > results were prety good. > > > > I'm glad to hear that. I had similar comments from other sources, and I > > must admit that the (very) few controls I had in my experiment seem to > > behave properly if apply print-tip-loess (and no bkg correction, > > because when I do I run into problems, as I mentioned in another thread) > > > > > > > It is true that the normalization process is basesd in > > > some assumptions. But not single microarray experimen > > > fullfil these assumptions... > > > HTH > > > Manuel > > > > I am aware that loess is pretty robust... I just wasn't sure that it > > was robust enough in an experiment such as this, where I expect the > > average median of ratios to be above 1 (although not by much, > > admittedly). > > > > Thanks for all the comments. I will definitely explore the normexp bkg > > correction method. > > > > Jose > > > > -- > > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > > Swann Building, Mayfield Road > > University of Edinburgh > > Edinburgh EH9 3JR > > UK > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: scannerOffset.png Type: image/png Size: 22436 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20060808 /efae6d9c/attachment.png

ADD REPLY • link 17.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi. On 8/7/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote: > Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: > > > > > > > > > On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote: > > > >> > >> Hi, > >> > >> I have a set of data from an experiment where there appears to be an > >> effect of the treatment on a large number of genes. I put scatterplots > >> for 6 of the slides here: > >> > >> http://mcnach.com/MISC/scatterplots.gif > >> > >> these are Cy3 vs Cy5, in log scale. > >> > >> These show that many genes are differentially expressed, and they are > >> mostly one one side only (upregulated; some of those slides are dye > >> swaps). > >> > >> Would this appear to violate (too much) any of the assumptions made by > >> loess normalisation? Should I investigate other normalisation > >> procedures? > > > > First, I would start by doing a VERY thorough evalutation of the slide > > quality for these slides, as these are very distorted scatterplots. IF the > > slide quality looks OK, then I would probably stay away from a non-linear > > normalization method, as these will tend to make your > > differentially-expressed genes look less differentially-expressed. > > > > Sean > > Hi Sean, > > thanks for your reply. The slides are good, I checked them well. The > strong effect is not so unexpected, as it involves transfection of > cells with a DNA-binding protein fused to a strong transactivator, so > in theory the fusion protein could be responsible of the expression of > a very large number of genes. There is some specificity to the binding, > but there should be many target sites, often at promoters... So the > effects are more or less what we expected, I suppose, and the quality > of the slides is good. The second spike going either almost vertical or > almost horizontal should correspond to those genes that are not > expressed on the particular cell line, but expressed after transfection. > > Do you have any suggestions of what sort of methods to use, for the > normalisation of such experiments? Until now I used loess for > everything, but I wasn't sure it would be okay for this experiment when > I saw these plots. Roughly what fraction of DEs do you except/see by visual inspection? BTW, it is not clear if your plots in scatterplots.gif are on the intensity or log scale, but looking at the noise structure I guess on the log scale. loess(), not lowess(), can be tuned to be very robust against outliers including non-symmetric ones. I know Gordon Smyth has done some examples/slides on this, but I'm not sure if they're in limma or not. In addition, in the aroma.light package you can assign weights to the datapoints for some of the normalization methods. Assigning a smaller weight to a datapoint will make that datapoint have less of a say in the estimation of the normalization function, but when it comes to normalize/transform the datapoints, all are transformed equally much. So with weights you may be able to tune your robustness against outliers further. /Henrik > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 17.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi Henrik, > Roughly what fraction of DEs do you except/see by visual inspection? > BTW, it is not clear if your plots in scatterplots.gif are on the > intensity or log scale, but looking at the noise structure I guess on > the log scale. Yes, it's log scale. I did mention it in teh other thread but forgot to say it here. What fraction? That's hard to say. Visually I'd say easily 20 or 30%. But that's a rough estimate. I thought this was probably a lot higher than most experiments. > loess(), not lowess(), can be tuned to be very robust against outliers > including non-symmetric ones. I know Gordon Smyth has done some > examples/slides on this, but I'm not sure if they're in limma or not. > In addition, in the aroma.light package you can assign weights to the > datapoints for some of the normalization methods. Assigning a smaller > weight to a datapoint will make that datapoint have less of a say in > the estimation of the normalization function, but when it comes to > normalize/transform the datapoints, all are transformed equally much. > So with weights you may be able to tune your robustness against > outliers further. that's on my "to do" list... I can use weights in limma. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 8.8 years ago

United Kingdom

Quoting Henrik Bengtsson <hb at="" maths.lth.se="">: > > In the bigger picture, given that you can identify those 20-30% DEs, > how are you going interpret such a large list of genes? > > /H The number of "useful" genes is quite smaller. This is because my experiment consists of 4 separate sub-experiments, all using a common reference (untransfected cells, in this case). Three of the subexperiments consist on teh hybridisation of transfected cells vs. untransfected. The transfection is of a construct expressing a fusion protein, teh first part contains a DNA-binding domain with certain sequence specificity (that we expect to occur in many promoters), the second is a strong transactivator. I'm hoping to detect teh binding of these protein domains by looking at what genes are upregulated, especially those that are only expressed after transfection. There are three subexperiments because they are slightly different proteins. The fourth experiment is a control, one of the previous fusion proteins with a couple of point mutations that we know to abolish strong specific DNA binding. Transfection of this construct still results in upregulation of many genes. What i do is analyse all data together (same common reference), and remove the DE genes (using an FDR of 0.05% or 0.01% as cut off) of the control experiment from the other three. Thsi reduces substantially the number of genes. From the remainder, then I focus on those that have negligible expression on teh untransfected cells, and decent expression afterwards. I then contrast this to what happened on teh control experiment (despite not being picked as DE in it). At the end I have tens of candidates. Less than 100. It's not a crazy number and then proceed to verification by RT etc, and the biology starts. When we started the experiment we were not sure what we would get. IN theory we could get thousands of genes. It all depends on how good our control is. that's why I used a simple common reference design, as it allows us to add easily another control if we find a better one. I already analysed a set of data on a cell line, with RNA prepared by somebody else. It worked pretty well, but the effect wasn't as great as I am seeing here. The transfection efficiency may have something to do with it. I checked all my transfections by Western blot and only used the ones that gave me strong expression of teh fusion protein, I suspect the other person wasn't so picky. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 17.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

Login before adding your answer.