Question about normalization of microarray data

0

Entering edit mode

Johan Lindberg ▴ 270

@johan-lindberg-815

Last seen 9.6 years ago

Hi all. I have a question about normalization of microarray data. In our lab we use in-house spotted cDNA arrays. We have so far used commercial reference when doing reference design. We are now trying a new approach but we have problems with normalization. What we have done is to pool product from every spot on the chip and done in vitro transcription on the PCR product. So we have RNA corresponding to every spot on the chip. Then this is used as a reference. It is much cheaper and we get signal from every spot on the chip instead of having spots with no signal in both channels. But when one looks at an MA-plot the plot will be skewed towards the reference. There are about (in this pilot case) 2000 spots that only give signal in the reference channel (which will skew the MA-plot). This will make many assumptions not correct when normalizing the data, e.g. using lowess normalization assuming that the ratio R/G should be 1 for most spots. Since the case for this kind of data is that one channel should be much stronger than the other, and we want to keep the normalization within slide (to be able to correct for spatial biases and intensity dependent) the only way I could think of is by spotting a lot of control spots (not present in the tested RNA or the reference RNA) and use these to normalize the data. Any comments of tips of how to normalize this kind of data are greatly appreciated. Best regards //Johan Lindberg ********************************************************************** ** ******************* Johan Lindberg Royal Institute of Technology AlbaNova University Center Stockholm Center for Physics, Astronomy and Biotechnology Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office) +46 8 553 783 44 Fax + 46 8 553 784 81 Visiting adress Roslagstullsbacken 21, Floor 3 Delivery adress Roslagsvägen 30B http://www.biotech.kth.se/molbio/microarray/index.html ********************************************************************** ** ******************* [[alternative HTML version deleted]]

Microarray Normalization Microarray Normalization • 1.3k views

ADD COMMENT • link updated 19.4 years ago by STKH Steen Krogsgaard ▴ 150 • written 19.5 years ago by Johan Lindberg ▴ 270

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.6 years ago

Just a quick thought (off the top of my head) - presumably the spots where there is only signal in the reference are only of interest in a "they're off" kind of way - you won't (and can't) actually be interested in fold changes for those spots. Therefore, you could remove these spots, normalise the rest of the data according to Loess, and then either just analyse that data, or replace the spots you removed with a value which you are satisfied means "off" in the experimental sample. Mick -----Original Message----- From: Johan Lindberg [mailto:johanl@biotech.kth.se] Sent: 18 November 2004 08:15 To: bioconductor@stat.math.ethz.ch Subject: [BioC] Question about normalization of microarray data Hi all. I have a question about normalization of microarray data. In our lab we use in-house spotted cDNA arrays. We have so far used commercial reference when doing reference design. We are now trying a new approach but we have problems with normalization. What we have done is to pool product from every spot on the chip and done in vitro transcription on the PCR product. So we have RNA corresponding to every spot on the chip. Then this is used as a reference. It is much cheaper and we get signal from every spot on the chip instead of having spots with no signal in both channels. But when one looks at an MA-plot the plot will be skewed towards the reference. There are about (in this pilot case) 2000 spots that only give signal in the reference channel (which will skew the MA-plot). This will make many assumptions not correct when normalizing the data, e.g. using lowess normalization assuming that the ratio R/G should be 1 for most spots. Since the case for this kind of data is that one channel should be much stronger than the other, and we want to keep the normalization within slide (to be able to correct for spatial biases and intensity dependent) the only way I could think of is by spotting a lot of control spots (not present in the tested RNA or the reference RNA) and use these to normalize the data. Any comments of tips of how to normalize this kind of data are greatly appreciated. Best regards //Johan Lindberg ********************************************************************** ** ******************* Johan Lindberg Royal Institute of Technology AlbaNova University Center Stockholm Center for Physics, Astronomy and Biotechnology Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office) +46 8 553 783 44 Fax + 46 8 553 784 81 Visiting adress Roslagstullsbacken 21, Floor 3 Delivery adress Roslagsv?gen 30B http://www.biotech.kth.se/molbio/microarray/index.html ********************************************************************** ** ******************* [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.5 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

STKH Steen Krogsgaard ▴ 150

@stkh-steen-krogsgaard-797

Last seen 9.6 years ago

Hi Johan, since you don't seem to have a suitable common reference, how about using a balanced block design instead? cheers Steen -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Johan Lindberg Sent: 18. november 2004 09:15 To: bioconductor@stat.math.ethz.ch Subject: [BioC] Question about normalization of microarray data Hi all. I have a question about normalization of microarray data. In our lab we use in-house spotted cDNA arrays. We have so far used commercial reference when doing reference design. We are now trying a new approach but we have problems with normalization. What we have done is to pool product from every spot on the chip and done in vitro transcription on the PCR product. So we have RNA corresponding to every spot on the chip. Then this is used as a reference. It is much cheaper and we get signal from every spot on the chip instead of having spots with no signal in both channels. But when one looks at an MA-plot the plot will be skewed towards the reference. There are about (in this pilot case) 2000 spots that only give signal in the reference channel (which will skew the MA-plot). This will make many assumptions not correct when normalizing the data, e.g. using lowess normalization assuming that the ratio R/G should be 1 for most spots. Since the case for this kind of data is that one channel should be much stronger than the other, and we want to keep the normalization within slide (to be able to correct for spatial biases and intensity dependent) the only way I could think of is by spotting a lot of control spots (not present in the tested RNA or the reference RNA) and use these to normalize the data. Any comments of tips of how to normalize this kind of data are greatly appreciated. Best regards //Johan Lindberg ********************************************************************** ** ******************* Johan Lindberg Royal Institute of Technology AlbaNova University Center Stockholm Center for Physics, Astronomy and Biotechnology Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office) +46 8 553 783 44 Fax + 46 8 553 784 81 Visiting adress Roslagstullsbacken 21, Floor 3 Delivery adress Roslagsv?gen 30B http://www.biotech.kth.se/molbio/microarray/index.html ********************************************************************** ** ******************* [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.5 years ago STKH Steen Krogsgaard ▴ 150

0

Entering edit mode

On Nov 18, 2004, at 6:42 AM, STKH ((Steen Krogsgaard)) wrote: > Hi Johan, > > since you don't seem to have a suitable common reference, how about > using a balanced block design instead? > > cheers > Steen > Yep. The true power of the two-color design is in comparing within slide two samples of interest. If one is going to use a common reference, then the common reference should probably have SOME semblance to the test sample in terms of gene expression. Even when using a commercially available reference, one presumably has some variation in expression that mimics that in the test sample better than using a non-biologic reference like PCR products. It will be interesting to see what you end up doing here, but I do agree that sing within array contrasts whenever possible is a good idea. Sean > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Johan > Lindberg > Sent: 18. november 2004 09:15 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Question about normalization of microarray data > > > Hi all. I have a question about normalization of microarray data. > > In our lab we use in-house spotted cDNA arrays. We have so far used > commercial reference when doing reference design. We are now trying a > new approach but we have problems with normalization. What we have > done is to pool product from every spot on the chip and done in vitro > transcription on the PCR product. So we have RNA corresponding to > every spot on the chip. Then this is used as a reference. It is much > cheaper and we get signal from every spot on the chip instead of > having spots with no signal in both channels. But when one looks at an > MA-plot the plot will be skewed towards the reference. There are about > (in this pilot case) 2000 spots that only give signal in the reference > channel (which will skew the MA-plot). This will make many assumptions > not correct when normalizing the data, e.g. using lowess normalization > assuming that the ratio R/G should be 1 for most spots. > Since the case for this kind of data is that one channel should be > much stronger than the other, and we want to keep the normalization > within slide (to be able to correct for spatial biases and intensity > dependent) the only way I could think of is by spotting a lot of > control spots (not present in the tested RNA or the reference RNA) and > use these to normalize the data.

ADD REPLY • link 19.5 years ago Sean Davis 21k

0

Entering edit mode

Dear I ask (from not being a statistician point of view), what do you mean by "since you don't seem to have a suitable common reference, how about using a balanced block design instead?" in this context? As far as I now a balanced block design considers the individual groups of measurements that are expected to be more homogeneous than others (with the same amount of observations within each group) when doing an Anova, or? Is this not something you would like to do after normalization, when trying to identify differences in some context in the data? Or did you refer to a model that considers dye-effects so you won't have to do normalization? Best regards //Johan -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Sean Davis Sent: Thursday, November 18, 2004 12:57 PM To: STKH (Steen Krogsgaard) Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Question about normalization of microarray data On Nov 18, 2004, at 6:42 AM, STKH ((Steen Krogsgaard)) wrote: > Hi Johan, > > since you don't seem to have a suitable common reference, how about > using a balanced block design instead? > > cheers > Steen > Yep. The true power of the two-color design is in comparing within slide two samples of interest. If one is going to use a common reference, then the common reference should probably have SOME semblance to the test sample in terms of gene expression. Even when using a commercially available reference, one presumably has some variation in expression that mimics that in the test sample better than using a non-biologic reference like PCR products. It will be interesting to see what you end up doing here, but I do agree that sing within array contrasts whenever possible is a good idea. Sean > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Johan > Lindberg > Sent: 18. november 2004 09:15 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Question about normalization of microarray data > > > Hi all. I have a question about normalization of microarray data. > > In our lab we use in-house spotted cDNA arrays. We have so far used > commercial reference when doing reference design. We are now trying a > new approach but we have problems with normalization. What we have > done is to pool product from every spot on the chip and done in vitro > transcription on the PCR product. So we have RNA corresponding to > every spot on the chip. Then this is used as a reference. It is much > cheaper and we get signal from every spot on the chip instead of > having spots with no signal in both channels. But when one looks at an > MA-plot the plot will be skewed towards the reference. There are about > (in this pilot case) 2000 spots that only give signal in the reference > channel (which will skew the MA-plot). This will make many assumptions > not correct when normalizing the data, e.g. using lowess normalization > assuming that the ratio R/G should be 1 for most spots. > Since the case for this kind of data is that one channel should be > much stronger than the other, and we want to keep the normalization > within slide (to be able to correct for spatial biases and intensity > dependent) the only way I could think of is by spotting a lot of > control spots (not present in the tested RNA or the reference RNA) and > use these to normalize the data. _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD REPLY • link 19.5 years ago Johan Lindberg ▴ 270

0

Entering edit mode

STKH Steen Krogsgaard ▴ 150

@stkh-steen-krogsgaard-797

Last seen 9.6 years ago

Hi Johan, I'm no stat wiz myself. It's just that common reference design and balanced block design are two more or less equally powerful designs. In your case I got the impression that you did not have a common reference that were biologically relevant and the one you have will give you problems during normalization. That's why I suggested balanced block design. I'm not aware that BB design should be any different from other designs when normalizing the individual slides (lowess or whatever). Richard Simon from NCI gave an excellent talk on experimental desing at the MGED-7 conference in Toronto this year, your can see his presentation at ftp://linus.nci.nih.gov/pub/techreport/MGED-B.pdf. cheers Steen -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Johan Lindberg Sent: 18. november 2004 16:07 To: bioconductor@stat.math.ethz.ch Subject: RE: [BioC] Question about normalization of microarray data Dear I ask (from not being a statistician point of view), what do you mean by "since you don't seem to have a suitable common reference, how about using a balanced block design instead?" in this context? As far as I now a balanced block design considers the individual groups of measurements that are expected to be more homogeneous than others (with the same amount of observations within each group) when doing an Anova, or? Is this not something you would like to do after normalization, when trying to identify differences in some context in the data? Or did you refer to a model that considers dye-effects so you won't have to do normalization? Best regards //Johan -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Sean Davis Sent: Thursday, November 18, 2004 12:57 PM To: STKH (Steen Krogsgaard) Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Question about normalization of microarray data On Nov 18, 2004, at 6:42 AM, STKH ((Steen Krogsgaard)) wrote: > Hi Johan, > > since you don't seem to have a suitable common reference, how about > using a balanced block design instead? > > cheers > Steen > Yep. The true power of the two-color design is in comparing within slide two samples of interest. If one is going to use a common reference, then the common reference should probably have SOME semblance to the test sample in terms of gene expression. Even when using a commercially available reference, one presumably has some variation in expression that mimics that in the test sample better than using a non-biologic reference like PCR products. It will be interesting to see what you end up doing here, but I do agree that sing within array contrasts whenever possible is a good idea. Sean > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Johan > Lindberg > Sent: 18. november 2004 09:15 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Question about normalization of microarray data > > > Hi all. I have a question about normalization of microarray data. > > In our lab we use in-house spotted cDNA arrays. We have so far used > commercial reference when doing reference design. We are now trying a > new approach but we have problems with normalization. What we have > done is to pool product from every spot on the chip and done in vitro > transcription on the PCR product. So we have RNA corresponding to > every spot on the chip. Then this is used as a reference. It is much > cheaper and we get signal from every spot on the chip instead of > having spots with no signal in both channels. But when one looks at an > MA-plot the plot will be skewed towards the reference. There are about > (in this pilot case) 2000 spots that only give signal in the reference > channel (which will skew the MA-plot). This will make many assumptions > not correct when normalizing the data, e.g. using lowess normalization > assuming that the ratio R/G should be 1 for most spots. > Since the case for this kind of data is that one channel should be > much stronger than the other, and we want to keep the normalization > within slide (to be able to correct for spatial biases and intensity > dependent) the only way I could think of is by spotting a lot of > control spots (not present in the tested RNA or the reference RNA) and > use these to normalize the data. _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.4 years ago STKH Steen Krogsgaard ▴ 150

Login before adding your answer.