help in 2-color data normalization

0

Entering edit mode

Jianping Jin ▴ 890

@jianping-jin-1212

Last seen 11.5 years ago

Dear list, I used r/gPreProcessedSignals from Agilent FE outpup files as a start to analyze without any filtering. The density plot (see attachment) indicates that both channels were pretty well consistent in high intensity range. There were separate read and green peaks, however, which located at log2 (5) and log2 (4) respectively. MA plots were pretty normal (see attached). The experiment was human colon cancer versus Stratgen universal human cancer RNAs. These two minor peaks, to me, may be more than what could be explained by just dye bias. As r/gPreprocessedSignal is supposed to have gone through a lowess normalization or something like that. Could they be "real" difference between the samples and the universal reference? Has anyone had similar observations? I appreciate any comments to help me out? ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu

Normalization Cancer Colon Normalization Cancer Colon • 2.2k views

ADD COMMENT • link updated 18.8 years ago by Jeremy Davis-Turak ▴ 50 • written 18.8 years ago by Jianping Jin ▴ 890

0

Entering edit mode

Jianping Jin ▴ 890

@jianping-jin-1212

Last seen 11.5 years ago

Sorry I made some mistakes on attachment of plots in my last email. I sent it again therefore. Sorry for multiple versions of emails. Dear list, I used r/gPreProcessedSignals from Agilent FE outpup files as a start to analyze without filtering out any genes, except for those control spots. The density plot indicates that both channels were pretty well matched in high intensity range. There were separate read and green peaks, however, which located at log2 (5) and log2 (4) respectively. MA plots were pretty normal (please visit the site for viewing plots: <http: www.unc.edu="" ~jjin="" graph=""/> ) The experiment was human colon cancer versus Stratgen universal human cancer RNAs. The two minor peaks, to me, may be more than what could be explained by just dye bias. As r/gPreprocessedSignal was supposed to have gone through a lowess normalization or something like that. Could they be "real" difference between the samples and the universal reference? Has anyone had similar observations? I appreciate any comments to help me out? thanks, Jianping ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu

ADD COMMENT • link 18.8 years ago Jianping Jin ▴ 890

0

Entering edit mode

Quoting Jianping Jin <jjin at="" email.unc.edu="">: > Sorry I made some mistakes on attachment of plots in my last email. I sent > it again therefore. Sorry for multiple versions of emails. > > Dear list, > > I used r/gPreProcessedSignals from Agilent FE outpup files as a start to > analyze without filtering out any genes, except for those control spots. > The density plot indicates that both channels were pretty well matched in > high intensity range. There were separate read and green peaks, however, > which located at log2 (5) and log2 (4) respectively. MA plots were pretty > normal (please visit the site for viewing plots: > <http: www.unc.edu="" ~jjin="" graph=""/> ) > > The experiment was human colon cancer versus Stratgen universal human > cancer RNAs. The two minor peaks, to me, may be more than what could be > explained by just dye bias. As r/gPreprocessedSignal was supposed to have > gone through a lowess normalization or something like that. Could they be > "real" difference between the samples and the universal reference? > > Has anyone had similar observations? I appreciate any comments to help me > out? > > thanks, > > Jianping I don't think you can say there are any real differences based on those peaks. A log2 of 5 comes from an intensity value of only 32, that's extremely low. In your MA plots you can see a kind of a "blob" in one direction, at the very left of the plots... it looks to me (without being familiar with teh actual processing you used), that anything below 8 or so is produced by very low intensity spots, and the measurements cannot be very reliable. If you do a filtering to remove low intensity spots (on BOTH channels, on all slides) you will probably clean up that area of the graph, and will remove the spots that produced those small peaks. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 18.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Hi Jose, Thanks for your comments! I agreed that intensities of those small peaks were too low to be reliable. We can remove them from further analyses. no question about that. In terms of my previous question of whether or not they could be "real" difference existing between the colon cancer and the universal cancer cell line RNAs, considerations may be given beyond just removing those spots. What I noticed was that some probes can only be hybridized with the reference RNAs and some others only with colon cancer samples (see "RG_cutoff.jpeg" at <http: www.unc.edu="" ~jjin="" graph=""/> ). Take one chip as an example, 4548 genes showed green signals more than 2^8 with read signals less than 2^6, and 1831 genes showed read signal more than 2^8 with green signal less than 2^5. On both cases maximum signals, read or green, can be as high as 2^12. The observation suggested that there exist some real differences between RNAs. This raises another question. Is the pooled universal cancer RNA an idea reference? It may create difficulties in explanation of results for some genes. Any comments will be appreciated! Jianping --On Thursday, May 10, 2007 6:46 PM +0100 J.delasHeras at ed.ac.uk wrote: > Quoting Jianping Jin <jjin at="" email.unc.edu="">: > >> Sorry I made some mistakes on attachment of plots in my last email. I >> sent it again therefore. Sorry for multiple versions of emails. >> >> Dear list, >> >> I used r/gPreProcessedSignals from Agilent FE outpup files as a start to >> analyze without filtering out any genes, except for those control spots. >> The density plot indicates that both channels were pretty well matched in >> high intensity range. There were separate read and green peaks, however, >> which located at log2 (5) and log2 (4) respectively. MA plots were pretty >> normal (please visit the site for viewing plots: >> <http: www.unc.edu="" ~jjin="" graph=""/> ) >> >> The experiment was human colon cancer versus Stratgen universal human >> cancer RNAs. The two minor peaks, to me, may be more than what could be >> explained by just dye bias. As r/gPreprocessedSignal was supposed to have >> gone through a lowess normalization or something like that. Could they be >> "real" difference between the samples and the universal reference? >> >> Has anyone had similar observations? I appreciate any comments to help me >> out? >> >> thanks, >> >> Jianping > > I don't think you can say there are any real differences based on > those peaks. A log2 of 5 comes from an intensity value of only 32, > that's extremely low. In your MA plots you can see a kind of a "blob" > in one direction, at the very left of the plots... it looks to me > (without being familiar with teh actual processing you used), that > anything below 8 or so is produced by very low intensity spots, and > the measurements cannot be very reliable. > If you do a filtering to remove low intensity spots (on BOTH channels, > on all slides) you will probably clean up that area of the graph, and > will remove the spots that produced those small peaks. > > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu

ADD REPLY • link 18.8 years ago Jianping Jin ▴ 890

0

Entering edit mode

Hi Jianping, > In terms of my previous question of whether or not they could be "real" > difference existing between the colon cancer and the universal cancer > cell line RNAs, considerations may be given beyond just removing those > spots. What I noticed was that some probes can only be hybridized with > the reference RNAs and some others only with colon cancer samples (see > "RG_cutoff.jpeg" at <http: www.unc.edu="" ~jjin="" graph=""/> ). Take one chip > as an example, 4548 genes showed green signals more than 2^8 with read > signals less than 2^6, and 1831 genes showed read signal more than 2^8 > with green signal less than 2^5. On both cases maximum signals, read or > green, can be as high as 2^12. The observation suggested that there > exist some real differences between RNAs. I am not surprised that you can find individual genes that have signal only in one of the samples, either the reference or the cancer one. In fact, this is teh sort of thing I am usually looking for: genes that are either silenced or activated in cancer, with respect to a "normal" reference. The plot your showing does not appear to come from normalised arrays, in which case you can infer little from the differences in the distribution. What it does show is that you have very weak signal on both channels on both arrays... Normalise your data (within arrays, probably using some "flavour" of loess), and look at the MA plots: that's a better picture of what's going on. In an ideal plot, genes that are only expressed in one sample tend to cluster along the left 2 sides of an imaginary diamond... for instance: http://mcnach.com/MISC/MAplot.png This is a very unusual MA plot, from an experiment where many many many genes are activated (a cell line transfected with a strong activator, hybridised against the non-transfected cells). I drew in red the "imaginary diamond", and numbered 1 and 2 teh two sides I was talking about. Along 1 you get genes that are activated in one sample (with M>0), and along 2 you woudl get genes silenced in teh same sample (with M<0). This experiment is unusual in that it allows to see clearly a "spike" of activated genes along "1". In most experiments you don'd see anything like that, but that's the area where ideally you'll have this sort of genes clustering. If there are many genes that only have signal in either of your samples, you may see a well populated "cloud" around these areas. Your MA plots seem to me to indicate that this is the case (starting from A around 8+, the stuff on teh left seems a little artifactual)... but you really need to dig in deeper if you want some clear answers ;) > This raises another question. Is the pooled universal cancer RNA an > idea reference? It may create difficulties in explanation of results > for some genes. Ideal? It depends on teh experiment, I suppose. It all depends on what questions you're asking. Even very closely related samples, from similar tissues, one cancerous and one normal, have lots of expression differences. Your answers will of course be determined by what comparisons you're making, what references you choose, etc. A pooled "universal cancer" RNA can potentially contain very different types of cells, etc... which can be good or bad, depending on what you're after, really... Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 18.8 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Jeremy Davis-Turak ▴ 50

@jeremy-davis-turak-2096

Last seen 5.8 years ago

United States

Hi Jianping, > In terms of my previous question of whether or not they could be "real" > difference existing between the colon cancer and the universal cancer cell > line RNAs, considerations may be given beyond just removing those spots. > What I noticed was that some probes can only be hybridized with the > reference RNAs and some others only with colon cancer samples (see > "RG_cutoff.jpeg" at <http: www.unc.edu="" ~jjin="" graph=""/> ). Take one chip as > an example, 4548 genes showed green signals more than 2^8 with read > signals less than 2^6, and 1831 genes showed read signal more than 2^8 with > green signal less than 2^5. On both cases maximum signals, read or green, > can be as high as 2^12. The observation suggested that there exist some > real differences between RNAs. If these differences you see are really due to differences in the cell types, you should see the reverse effect in the dye-swapped arrays. You may also want to have a look at the raw signals, and also check the quality of the slides (just because the slides are Agilent doesn't mean they're perfect). Jeremy Davis-Turak

ADD COMMENT • link 18.8 years ago Jeremy Davis-Turak ▴ 50

0

Entering edit mode

Thanks Jeremy, You made a good point. Unfortunately the dye-swapped arrays were not considered in the original experiment design. We did check the quality of the slides. We found the slide were good except a couple of samples which we removed from the analysis due to some gridding problems. thanks again for your comment! Jianping --On Saturday, May 12, 2007 11:50 AM -0700 Jeremy Davis-Turak <jeremydt at="" gmail.com=""> wrote: > Hi Jianping, > >> In terms of my previous question of whether or not they could be "real" >> difference existing between the colon cancer and the universal cancer >> cell line RNAs, considerations may be given beyond just removing those >> spots. What I noticed was that some probes can only be hybridized with >> the reference RNAs and some others only with colon cancer samples (see >> "RG_cutoff.jpeg" at <http: www.unc.edu="" ~jjin="" graph=""/> ). Take one chip as >> an example, 4548 genes showed green signals more than 2^8 with read >> signals less than 2^6, and 1831 genes showed read signal more than 2^8 >> with green signal less than 2^5. On both cases maximum signals, read or >> green, can be as high as 2^12. The observation suggested that there >> exist some real differences between RNAs. > > If these differences you see are really due to differences in the cell > types, you should see the reverse effect in the dye-swapped arrays. > > You may also want to have a look at the raw signals, and also check > the quality of the slides (just because the slides are Agilent doesn't > mean they're perfect). > > Jeremy Davis-Turak > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu

ADD REPLY • link 18.8 years ago Jianping Jin ▴ 890

Login before adding your answer.