QA of two-color array data

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 5 months ago

Barcelona/Universitat Pompeu Fabra

dear list, i have very limited experience in the QA of microarray data and i'd like to know the opinion from people with more experience with this job if there are issues with the QA of the data i'm analizing, and if could pre-process these data differently in order to try to correct for the possible QA problems. i'm re-analizing a series of 12 two-color microarray experiments deposited in GEO (acc. GSE13943). these are custom 4x44K Agilent arrays with probes targeting exons and splice junctions in Drosophila Melanogaster. the experiments correspond to RNAi knock-downs of 4 RNA-binding proteins -hrp36, hrp38, hrp40 and hrp48- (red channel) against a non-specific RNAi control (green channel) in three independent replicates for each KO experiment. after reading the raw data files into an RGlist object called 'RG' i've performed background correction, within- and between-normalization as follows: RGneMLE <- backgroundCorrect(RG, method="normexp", normexp.method="mle", offset=50) MA <- normalizeWithinArrays(RGneMLE[RGneMLE$genes$ControlType!=-1,], method="loess", bc.method="none") MA <- normalizeBetweenArrays(MA, method="scale") i have produced the corresponding MA-plots of the latter pre-processed MA data object for each of the 12 arrays which i've put on the web so that you can take a look at them: http://functionalgenomics.upf.edu/QA/MA-plots1.png http://functionalgenomics.upf.edu/QA/MA-plots2.png when i look to these plots i see the following two unexpected features: -in the replicates of hrp36, replicate 1 of hrp38, replicate 1 of hrp40 and replicate 2 of hrp48 there are some small intensity dependent biases affecting to the low average values A. -through all replicates i see two clusters of probes with low M values (i.e., higher green signal). if i look to the image plots (generated with 'imageplot3by2(RG)'): http://functionalgenomics.upf.edu/QA/image-Gb-1-6.png http://functionalgenomics.upf.edu/QA/image-Gb-7-12.png i see some line crossing from the top to the bottom, but i don't know if this is related to the issues raised before. i've run the array quality metrics package thorugh these data with the following command: arrayQualityMetrics(expressionset=RG, outdir="aqm", force=TRUE) and put the output here: http://functionalgenomics.upf.edu/QA/aqm/QMreport.html according the this report there are no outlier arrays and so i'm wondering whether maybe in fact there are no QA problems and simply i'm not using the appropriate pre-processing algorithms for this kind of data. thanks! robert.

Microarray Microarray • 1.2k views

ADD COMMENT • link updated 16.3 years ago by Naomi Altman ★ 6.0k • written 16.3 years ago by Robert Castelo ★ 3.4k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.8 years ago

United States

The weird spots are probably the Agilent quality control spots. Remove them and redo the plot. --Naomi At 05:53 AM 10/27/2009, Robert Castelo wrote: >dear list, > >i have very limited experience in the QA of microarray data and i'd like >to know the opinion from people with more experience with this job if >there are issues with the QA of the data i'm analizing, and if could >pre-process these data differently in order to try to correct for the >possible QA problems. > >i'm re-analizing a series of 12 two-color microarray experiments >deposited in GEO (acc. GSE13943). these are custom 4x44K Agilent arrays >with probes targeting exons and splice junctions in Drosophila >Melanogaster. the experiments correspond to RNAi knock-downs of 4 >RNA-binding proteins -hrp36, hrp38, hrp40 and hrp48- (red channel) >against a non-specific RNAi control (green channel) in three independent >replicates for each KO experiment. > >after reading the raw data files into an RGlist object called 'RG' i've >performed background correction, within- and between-normalization as >follows: > >RGneMLE <- backgroundCorrect(RG, method="normexp", normexp.method="mle", >offset=50) > >MA <- normalizeWithinArrays(RGneMLE[RGneMLE$genes$ControlType!=-1,], > method="loess", bc.method="none") > >MA <- normalizeBetweenArrays(MA, method="scale") > >i have produced the corresponding MA-plots of the latter pre- processed >MA data object for each of the 12 arrays which i've put on the web so >that you can take a look at them: > >http://functionalgenomics.upf.edu/QA/MA-plots1.png > >http://functionalgenomics.upf.edu/QA/MA-plots2.png > >when i look to these plots i see the following two unexpected features: > >-in the replicates of hrp36, replicate 1 of hrp38, replicate 1 of hrp40 >and replicate 2 of hrp48 there are some small intensity dependent biases >affecting to the low average values A. > >-through all replicates i see two clusters of probes with low M values >(i.e., higher green signal). > >if i look to the image plots (generated with 'imageplot3by2(RG)'): > >http://functionalgenomics.upf.edu/QA/image-Gb-1-6.png > >http://functionalgenomics.upf.edu/QA/image-Gb-7-12.png > >i see some line crossing from the top to the bottom, but i don't know if >this is related to the issues raised before. > >i've run the array quality metrics package thorugh these data with the >following command: > >arrayQualityMetrics(expressionset=RG, outdir="aqm", force=TRUE) > >and put the output here: > >http://functionalgenomics.upf.edu/QA/aqm/QMreport.html > >according the this report there are no outlier arrays and so i'm >wondering whether maybe in fact there are no QA problems and simply i'm >not using the appropriate pre-processing algorithms for this kind of >data. > >thanks! >robert. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 16.3 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

thanks Naomi, i guess this is embarrasingly obvious :-} i've made the plot without the Agilent control spots and the two clusters with low M-values have dissappear from these plots: http://functionalgenomics.upf.edu/QA/MA-plotsNoCtrls1.png http://functionalgenomics.upf.edu/QA/MA-plotsNoCtrls2.png and now stand out even more clearly the intensity dependent biases for some of the arrays. i find them a bit weird in the sense that it is not a bias affecting the bulk of the probes with low intensities but a subset of them. i've googled about this but found only success stories about removing such bias after background correction and normalization. if i look to the MA-plots for the raw data (from the 'RG' object) excluding control spots: http://functionalgenomics.upf.edu/QA/MA-plotsRawNoCtrls1.png http://functionalgenomics.upf.edu/QA/MA-plotsRawNoCtrls2.png i see the bias affecting the bulk of probes with low intensities for those problematic cases, so i guess the problem might be that i'm not using appropriate background correction and/or normalization algorithms. as shown in my previous email i'm currently using 'normexp' with 'mle' (which if i correctly interpret a recent post from Gordon, the version i used is in fact employing 'saddlepoint' estimates instead of 'mle'), loess within-normalization and scale between-normalization. do you, or anybody in the list, have any hint on how could i preprocess these data in order to try to remove those artifacts? thanks, robert. On Tue, 2009-10-27 at 11:38 -0400, Naomi Altman wrote: > The weird spots are probably the Agilent quality control > spots. Remove them and redo the plot. > > --Naomi > > At 05:53 AM 10/27/2009, Robert Castelo wrote: > >dear list, > > > >i have very limited experience in the QA of microarray data and i'd like > >to know the opinion from people with more experience with this job if > >there are issues with the QA of the data i'm analizing, and if could > >pre-process these data differently in order to try to correct for the > >possible QA problems. > > > >i'm re-analizing a series of 12 two-color microarray experiments > >deposited in GEO (acc. GSE13943). these are custom 4x44K Agilent arrays > >with probes targeting exons and splice junctions in Drosophila > >Melanogaster. the experiments correspond to RNAi knock-downs of 4 > >RNA-binding proteins -hrp36, hrp38, hrp40 and hrp48- (red channel) > >against a non-specific RNAi control (green channel) in three independent > >replicates for each KO experiment. > > > >after reading the raw data files into an RGlist object called 'RG' i've > >performed background correction, within- and between-normalization as > >follows: > > > >RGneMLE <- backgroundCorrect(RG, method="normexp", normexp.method="mle", > >offset=50) > > > >MA <- normalizeWithinArrays(RGneMLE[RGneMLE$genes$ControlType!=-1,], > > method="loess", bc.method="none") > > > >MA <- normalizeBetweenArrays(MA, method="scale") > > > >i have produced the corresponding MA-plots of the latter pre- processed > >MA data object for each of the 12 arrays which i've put on the web so > >that you can take a look at them: > > > >http://functionalgenomics.upf.edu/QA/MA-plots1.png > > > >http://functionalgenomics.upf.edu/QA/MA-plots2.png > > > >when i look to these plots i see the following two unexpected features: > > > >-in the replicates of hrp36, replicate 1 of hrp38, replicate 1 of hrp40 > >and replicate 2 of hrp48 there are some small intensity dependent biases > >affecting to the low average values A. > > > >-through all replicates i see two clusters of probes with low M values > >(i.e., higher green signal). > > > >if i look to the image plots (generated with 'imageplot3by2(RG)'): > > > >http://functionalgenomics.upf.edu/QA/image-Gb-1-6.png > > > >http://functionalgenomics.upf.edu/QA/image-Gb-7-12.png > > > >i see some line crossing from the top to the bottom, but i don't know if > >this is related to the issues raised before. > > > >i've run the array quality metrics package thorugh these data with the > >following command: > > > >arrayQualityMetrics(expressionset=RG, outdir="aqm", force=TRUE) > > > >and put the output here: > > > >http://functionalgenomics.upf.edu/QA/aqm/QMreport.html > > > >according the this report there are no outlier arrays and so i'm > >wondering whether maybe in fact there are no QA problems and simply i'm > >not using the appropriate pre-processing algorithms for this kind of > >data. > > > >thanks! > >robert. > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > >

ADD REPLY • link 16.3 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Naomi, i think you can dismiss my previous email below, i thought i'd really like to try the normexp method with mle estimates and couldn't wait till Gordon's patch to using an RGlist with the normexp-mle method would show up in the bioconductor server. so i hacked the RG object to extract a matrix of the red and green intensities and use this method with the matrix which according to Gordon would do the expected job. then i paste the result again into my RGlist object. after the normalization steps the result is that using the normexp method with mle estimates the biases have been successfully removed: http://functionalgenomics.upf.edu/QA/MA-plotsNoCtrlsNEmle1.png http://functionalgenomics.upf.edu/QA/MA-plotsNoCtrlsNEmle2.png Naomi, thanks again, and i should also thank Tobias Straub for raising the issue with the implementation of the normexp method, James McDonald for drawing Gordon's attention and, of course, Gordon Smyth for clearing up the issue so quickly. robert. On Wed, 2009-10-28 at 10:31 +0100, Robert Castelo wrote: > thanks Naomi, i guess this is embarrasingly obvious :-} i've made the > plot without the Agilent control spots and the two clusters with low > M-values have dissappear from these plots: > > http://functionalgenomics.upf.edu/QA/MA-plotsNoCtrls1.png > http://functionalgenomics.upf.edu/QA/MA-plotsNoCtrls2.png > > and now stand out even more clearly the intensity dependent biases for > some of the arrays. i find them a bit weird in the sense that it is not > a bias affecting the bulk of the probes with low intensities but a > subset of them. i've googled about this but found only success stories > about removing such bias after background correction and normalization. > > if i look to the MA-plots for the raw data (from the 'RG' object) > excluding control spots: > > http://functionalgenomics.upf.edu/QA/MA-plotsRawNoCtrls1.png > http://functionalgenomics.upf.edu/QA/MA-plotsRawNoCtrls2.png > > i see the bias affecting the bulk of probes with low intensities for > those problematic cases, so i guess the problem might be that i'm not > using appropriate background correction and/or normalization algorithms. > > as shown in my previous email i'm currently using 'normexp' with > 'mle' (which if i correctly interpret a recent post from Gordon, the > version i used is in fact employing 'saddlepoint' estimates instead of > 'mle'), loess within-normalization and scale between-normalization. > > do you, or anybody in the list, have any hint on how could i preprocess > these data in order to try to remove those artifacts? > > thanks, > > robert. > > > On Tue, 2009-10-27 at 11:38 -0400, Naomi Altman wrote: > > The weird spots are probably the Agilent quality control > > spots. Remove them and redo the plot. > > > > --Naomi > > > > At 05:53 AM 10/27/2009, Robert Castelo wrote: > > >dear list, > > > > > >i have very limited experience in the QA of microarray data and i'd like > > >to know the opinion from people with more experience with this job if > > >there are issues with the QA of the data i'm analizing, and if could > > >pre-process these data differently in order to try to correct for the > > >possible QA problems. > > > > > >i'm re-analizing a series of 12 two-color microarray experiments > > >deposited in GEO (acc. GSE13943). these are custom 4x44K Agilent arrays > > >with probes targeting exons and splice junctions in Drosophila > > >Melanogaster. the experiments correspond to RNAi knock-downs of 4 > > >RNA-binding proteins -hrp36, hrp38, hrp40 and hrp48- (red channel) > > >against a non-specific RNAi control (green channel) in three independent > > >replicates for each KO experiment. > > > > > >after reading the raw data files into an RGlist object called 'RG' i've > > >performed background correction, within- and between- normalization as > > >follows: > > > > > >RGneMLE <- backgroundCorrect(RG, method="normexp", normexp.method="mle", > > >offset=50) > > > > > >MA <- normalizeWithinArrays(RGneMLE[RGneMLE$genes$ControlType!=-1,], > > > method="loess", bc.method="none") > > > > > >MA <- normalizeBetweenArrays(MA, method="scale") > > > > > >i have produced the corresponding MA-plots of the latter pre- processed > > >MA data object for each of the 12 arrays which i've put on the web so > > >that you can take a look at them: > > > > > >http://functionalgenomics.upf.edu/QA/MA-plots1.png > > > > > >http://functionalgenomics.upf.edu/QA/MA-plots2.png > > > > > >when i look to these plots i see the following two unexpected features: > > > > > >-in the replicates of hrp36, replicate 1 of hrp38, replicate 1 of hrp40 > > >and replicate 2 of hrp48 there are some small intensity dependent biases > > >affecting to the low average values A. > > > > > >-through all replicates i see two clusters of probes with low M values > > >(i.e., higher green signal). > > > > > >if i look to the image plots (generated with 'imageplot3by2(RG)'): > > > > > >http://functionalgenomics.upf.edu/QA/image-Gb-1-6.png > > > > > >http://functionalgenomics.upf.edu/QA/image-Gb-7-12.png > > > > > >i see some line crossing from the top to the bottom, but i don't know if > > >this is related to the issues raised before. > > > > > >i've run the array quality metrics package thorugh these data with the > > >following command: > > > > > >arrayQualityMetrics(expressionset=RG, outdir="aqm", force=TRUE) > > > > > >and put the output here: > > > > > >http://functionalgenomics.upf.edu/QA/aqm/QMreport.html > > > > > >according the this report there are no outlier arrays and so i'm > > >wondering whether maybe in fact there are no QA problems and simply i'm > > >not using the appropriate pre-processing algorithms for this kind of > > >data. > > > > > >thanks! > > >robert. > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor at stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 16.3 years ago Robert Castelo ★ 3.4k

Login before adding your answer.