Normalization quality assessment tools?

0

Entering edit mode

Giulio Di Giovanni ▴ 540

@giulio-di-giovanni-950

Last seen 11.4 years ago

Hi all, I've an experiment with almost 300 arrays, single channel (not Affymetrix, but some in-house made peptide arrays). Differently from all my past experiments, this time I have the suspicion that the normalization step it's not necessary, or worse.1) the qqplot of unnormalized intensities seems pretty normal, and normalizing only slightly (with really a small effects) improves the situation.2) after normalization I lose some signal (of course) and I lose ALL what they seem differentially recognized peptides, in a 2 groups comparison. Before normalization they stand out quite consistently in 200 vs 100 arrays.3) we are not talking about genes, so most of the usual hypothesis to be made in order to apply normalization are here not valid. For example, in this case we have a mass response, where 90% of the spots have higher intensity in one group compared to the other. So I cannot use many of the most common normalization methods. I use a linear model based method instead which in the past, on smaller experiments, gave good results. But now even this seems to have a too drastic impact on the data. Besides the qqplot, or the boxplot of the slide intensities (the latter in this case gives no information at all, the 300 boxes either before and after normalization are not the same line), could you please suggest me...- some diagnostic tools, plots or packages to asses the quality of the normalization procedure.- some plots or tools used for counter-examples where the normalization it's not only not effective, but even has a negative impact in terms of data loss? Right now the only thing I can think about it's to convert my data matrix into an expression set and to apply affy's pseudo-MAplot to the various arrays, but I don't have big hopes ... :( Any help will be highly appreciated,Thanks and regards, Giulio [[alternative HTML version deleted]]

Normalization convert Normalization convert • 1000 views

ADD COMMENT • link updated 14.9 years ago by Wolfgang Huber ★ 13k • written 14.9 years ago by Giulio Di Giovanni ▴ 540

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

Dear Giulio, it will probably help you to phrase your question more precisely. You referred to "the normalization step" without ever saying what you mean by that. There are many different ways of "normalization" for such a dataset, and, of course, the choice cannot be made a priori, but rather, requires data quality assessment, identifying what the undesired technical effects are that you want to "normalise" away, what property of the data you want to keep high (e.g., the number of differentially binding peptides?) and choosing an appropriate computational method. The 'arrayQualityMetrics' package might provide some relevant plots. See also https://stat.ethz.ch/pipermail/bioconductor/2011-February/037915.html Best wishes Wolfgang Giulio Di Giovanni scripsit 25/02/11 12:10: > > Hi all, > I've an experiment with almost 300 arrays, single channel (not Affymetrix, but some in-house made peptide arrays). Differently from all my past experiments, this time I have the suspicion that the normalization step it's not necessary, or worse.1) the qqplot of unnormalized intensities seems pretty normal, and normalizing only slightly (with really a small effects) improves the situation.2) after normalization I lose some signal (of course) and I lose ALL what they seem differentially recognized peptides, in a 2 groups comparison. Before normalization they stand out quite consistently in 200 vs 100 arrays.3) we are not talking about genes, so most of the usual hypothesis to be made in order to apply normalization are here not valid. For example, in this case we have a mass response, where 90% of the spots have higher intensity in one group compared to the other. So I cannot use many of the most common normalization methods. I use a linear model based method instead which i! > n the past, on smaller experiments, gave good results. But now even this seems to have a too drastic impact on the data. > Besides the qqplot, or the boxplot of the slide intensities (the latter in this case gives no information at all, the 300 boxes either before and after normalization are not the same line), could you please suggest me...- some diagnostic tools, plots or packages to asses the quality of the normalization procedure.- some plots or tools used for counter-examples where the normalization it's not only not effective, but even has a negative impact in terms of data loss? > Right now the only thing I can think about it's to convert my data matrix into an expression set and to apply affy's pseudo-MAplot to the various arrays, but I don't have big hopes ... :( > Any help will be highly appreciated,Thanks and regards, > Giulio > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 14.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, thanks a lot for your help and suggestions. You're right, I didn't explained it very well skipping details of what I "want to normalize". Accordingly to Nathman et al. (1997) we identified many sources of technical variation through the use of linear mixed models, and we removed them (slide effect, patient effect, block and subarray effects, plus various interactions of these. In a parallel study we showed as the "operator" and "day" effects are not relevant). Please, forgive me if I trivialize the problem now, but let's assume for a moment that this approach is satisfactory, from the theoretical point of view (for what it can be an "a priori" correct method, of course. So simply it doesn't need hypotheses colliding with my data). As you correctly guessed we would like to maximize the number of differentially binding peptides. And let's say it's this method is the best I can use in this moment also from the practical point of view. Even once seen the data there's not that much else I can do computationally. There's a point where you allowed say: "Ok, I'm going to use the raw data instead"?Or maybe at the contrary you always have to find the right way to "clean your data"? And if you're not happy with what you get, you have to choose anyway the "lesser of evils"?Sorry, I see that putting the question in this way it's not "scientific", and as I said it is "trivial", and of course I'll also go on by looking for a better way to identify the sources of technical variability of my experiment and by removing them correctly with the minimal harm for the data itself, but still I see it as a very practical issue. I'd keep the question open, since every suggestions is always welcome, and I'll go immediately to check arrayQualitymetrics as you kindly recommended, Thanks a lot Giulio > Date: Fri, 25 Feb 2011 15:38:01 +0100 > From: whuber@embl.de > To: bioconductor@r-project.org > Subject: Re: [BioC] Normalization quality assessment tools? > > Dear Giulio, > > it will probably help you to phrase your question more precisely. You > referred to "the normalization step" without ever saying what you mean > by that. There are many different ways of "normalization" for such a > dataset, and, of course, the choice cannot be made a priori, but rather, > requires data quality assessment, identifying what the undesired > technical effects are that you want to "normalise" away, what property > of the data you want to keep high (e.g., the number of differentially > binding peptides?) and choosing an appropriate computational method. > > The 'arrayQualityMetrics' package might provide some relevant plots. See > also https://stat.ethz.ch/pipermail/bioconductor/2011-February/037915.html > > Best wishes > Wolfgang > > > Giulio Di Giovanni scripsit 25/02/11 12:10: > > > > Hi all, > > I've an experiment with almost 300 arrays, single channel (not Affymetrix, but some in-house made peptide arrays). Differently from all my past experiments, this time I have the suspicion that the normalization step it's not necessary, or worse.1) the qqplot of unnormalized intensities seems pretty normal, and normalizing only slightly (with really a small effects) improves the situation.2) after normalization I lose some signal (of course) and I lose ALL what they seem differentially recognized peptides, in a 2 groups comparison. Before normalization they stand out quite consistently in 200 vs 100 arrays.3) we are not talking about genes, so most of the usual hypothesis to be made in order to apply normalization are here not valid. For example, in this case we have a mass response, where 90% of the spots have higher intensity in one group compared to the other. So I cannot use many of the most common normalization methods. I use a linear model based method instead which > i! > > n the past, on smaller experiments, gave good results. But now even this seems to have a too drastic impact on the data. > > Besides the qqplot, or the boxplot of the slide intensities (the latter in this case gives no information at all, the 300 boxes either before and after normalization are not the same line), could you please suggest me...- some diagnostic tools, plots or packages to asses the quality of the normalization procedure.- some plots or tools used for counter-examples where the normalization it's not only not effective, but even has a negative impact in terms of data loss? > > Right now the only thing I can think about it's to convert my data matrix into an expression set and to apply affy's pseudo-MAplot to the various arrays, but I don't have big hopes ... :( > > Any help will be highly appreciated,Thanks and regards, > > Giulio > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 14.9 years ago Giulio Di Giovanni ▴ 540

Login before adding your answer.