different density

0

Entering edit mode

Jakub Mieczkowski ▴ 20

@jakub-mieczkowski-2548

Last seen 11.3 years ago

Hi All, I'm new to Bioconductor and I want to analyse time course data (6 time points, 3 oligo arrays in each). During the quality control (QCReport) I found that 4 arrays have different densities. What is shown here: http://students.mimuw.edu.pl/~jm214641/BoxANDden.pdf Plot of NUSE shows differences too. Images of weights are a little bit different form rest, but I can't notice any artefacts. 3 of them, are from the same time point. Should I remove them from further analysis (differences can have biological basis)? Or maybe I just can't use methods like RMA (because of different distributions)? Do you have any suggestions? Thanks, Kuba

oligo oligo • 1.3k views

ADD COMMENT • link updated 18.0 years ago by Naomi Altman ★ 6.0k • written 18.0 years ago by Jakub Mieczkowski ▴ 20

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 10 months ago

United States

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071217/ 846ca465/attachment.pl

ADD COMMENT • link 18.0 years ago Sean Davis 21k

0

Entering edit mode

First of all thank you very much for response. Unfortunately I don't understand what do you mean that I should look closely. I've got only .CEL files and I have no idea what else I can do. QCReport is available here: http://students.mimuw.edu.pl/%7Ejm214641/AffyQCReport.pdf On RLE and RNAdeg plots I can't distinguish 4 "outliers" from rest. How can I check what was measured (background or signal)? Should I use P/M/A method or something different? Are there any other Quality Control methods than QCReport, RLE, NUSE and image analysis (residuals, weigths). Maybe, in this situation, some pre-processing methods are better than another? Maybe linear transformation can help? Thank You, Kuba Sean Davis pisze: > > > On Dec 17, 2007 5:28 PM, Jakub Mieczkowski <kubamieczkowski at="" op.pl=""> <mailto:kubamieczkowski at="" op.pl="">> wrote: > > Hi All, > I'm new to Bioconductor and I want to analyse time course data (6 time > points, 3 oligo arrays in each). During the quality control (QCReport) I > found that 4 arrays have different densities. What is shown here: > > http://students.mimuw.edu.pl/~jm214641/BoxANDden.pdf > <http: students.mimuw.edu.pl="" %7ejm214641="" boxandden.pdf=""> > > Plot of NUSE shows differences too. Images of weights are a little bit > different form rest, but I can't notice any artefacts. > 3 of them, are from the same time point. > > Should I remove them from further analysis (differences can have > biological basis)? Or maybe I just can't use methods like RMA (because > of different distributions)? Do you have any suggestions? > > > Hi, Kuba. You will probably need to look closely at the QC information > on these arrays, but I would be concerned that these arrays didn't work > for one reason or another given the much lower intensities associate > with your four "outlier arrays". I do not think I would blindly apply > RMA to those arrays without getting a better sense of whether or not > they are measuring something and not just representing mostly background > signal. > > Sean > >

ADD REPLY • link 18.0 years ago Jakub Mieczkowski ▴ 20

0

Entering edit mode

A plot that is often quite informative is log(exprs) vs log(exprs) for the unnormalized probes from replicate arrays (or just log(pm) vs log(pm)) . If the arrays are "good" the technical replicates have high correlation and are tightly clustered on the diagonal of this plot, and biological replicates are shaped more like an American football - not a bit more pointy at the extremes than an ellipse. Bad arrays are either much more scattered, do not show a diagonal trend or may be jammed into the upper or lower section of the plot. --Naomi At 05:30 PM 12/18/2007, Jakub Mieczkowski wrote: >First of all thank you very much for response. >Unfortunately I don't understand what do you mean that I should look >closely. I've got only .CEL files and I have no idea what else I can do. >QCReport is available here: > >http://students.mimuw.edu.pl/%7Ejm214641/AffyQCReport.pdf > >On RLE and RNAdeg plots I can't distinguish 4 "outliers" from rest. > >How can I check what was measured (background or signal)? Should I use >P/M/A method or something different? Are there any other Quality Control >methods than QCReport, RLE, NUSE and image analysis (residuals, >weigths). Maybe, in this situation, some pre-processing methods are >better than another? Maybe linear transformation can help? >Thank You, >Kuba > >Sean Davis pisze: > > > > > > On Dec 17, 2007 5:28 PM, Jakub Mieczkowski <kubamieczkowski at="" op.pl=""> > <mailto:kubamieczkowski at="" op.pl="">> wrote: > > > > Hi All, > > I'm new to Bioconductor and I want to analyse time course data (6 time > > points, 3 oligo arrays in each). During the quality control > (QCReport) I > > found that 4 arrays have different densities. What is shown here: > > > > http://students.mimuw.edu.pl/~jm214641/BoxANDden.pdf > > <http: students.mimuw.edu.pl="" %7ejm214641="" boxandden.pdf=""> > > > > Plot of NUSE shows differences too. Images of weights are a little bit > > different form rest, but I can't notice any artefacts. > > 3 of them, are from the same time point. > > > > Should I remove them from further analysis (differences can have > > biological basis)? Or maybe I just can't use methods like RMA (because > > of different distributions)? Do you have any suggestions? > > > > > > Hi, Kuba. You will probably need to look closely at the QC information > > on these arrays, but I would be concerned that these arrays didn't work > > for one reason or another given the much lower intensities associate > > with your four "outlier arrays". I do not think I would blindly apply > > RMA to those arrays without getting a better sense of whether or not > > they are measuring something and not just representing mostly background > > signal. > > > > Sean > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.0 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

On 19/12/2007, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > A plot that is often quite informative is log(exprs) vs log(exprs) > for the unnormalized probes from replicate arrays (or just log(pm) vs > log(pm)) . If the arrays are "good" the technical replicates have > high correlation and are tightly clustered on the diagonal of this > plot, and biological replicates are shaped more like an American > football - not a bit more pointy at the extremes than an ellipse. > > Bad arrays are either much more scattered, do not show a diagonal > trend or may be jammed into the upper or lower section of the plot. I can agree with saturation effects (and partly amount of scatter), but *absolutely not* such things as non-linear discrepancies away from diagonal on the logarithmic scale. If you plot data in (log(y1), log(y2)) and see nonlinearities, that is very often due to the simple fact that you have taken the logarithmic transform on signals that got a bit of offset ("background"). If you instead plot (y1,y2) you'll often find that the data lie on a nice straight line. The curvature comes from the fact that the line you can fit through the data cloud does not pass through the origin (0,0). It is more common to discuss the above effects in a log-ratio log-intensity plot, that is, rotate the data to (A,M) where that M=log2(y1/y2) and A=log2(y1*y2)/2. Then data "should be" along M=0, but the offset and the logarithmic transform will make it bend like a banana. Roughly the same way as in (log(y1), log(y2)) just rotated and rescaled. Now to the apparent noise levels in log-ratios: Even if you have no banana shape in (A,M) you can still have offset in the data. This happens when the offset is effectively the same (when taking into account differences in scales). You can easily try this yourself. Take data that is nice and straight along the M=0 line in an M vs A plot. Then, go back to the intensity scale and add the same offset 'a' (say a=500) to both channels, i.e. y1' <- y1 + a and y2' <- y2 + a, and calculate M'=log2(y1'/y2') and A'=log2(y1'*y2')/2. When you plot (A',M') the data is still straight and along M'=0. However, we do know there is offset because we added it! Ok, even "worse" if you look at the spread of {M'} compared with the spread of {M}, you'll find that M' is much "cleaner" - when you increase 'a' it goes from being a "funnel", to a "American football", to a "lentil", and finally it will be sucked up in a "black hole". In other words, evaluating quality by looking at the variance in M is dangerous and deceptive, if you're not careful. If you think about it, in the perfect world without offset but with noise, you're log-ratios will/should have infinite variance for signals close to zero, e.g. "log2(0/0)". (How to deal with this fact is a different issue). To summarize, don't throw out samples/arrays just because their (log(y1),log(y2)) or (A,M) plots look like a banana, or if their log-ratios (M) blow up at lower log-intensities (A). Such effect can be fixed by using the *correct* calibration/normalization. Microarray experiments still cost money and RNA/DNA might be scarce. In order to stop myself from ranting more about this here, please read the following instead: H. Bengtsson and O. H?ssjer, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method, BMC Bioinformatics, 2006, 7:100. http://www.biomedcentral.com/1471-2105/7/100/ (It got references to other papers also dealing with this problem, although they are less explicit about it) Cheers Henrik > > --Naomi > > > At 05:30 PM 12/18/2007, Jakub Mieczkowski wrote: > >First of all thank you very much for response. > >Unfortunately I don't understand what do you mean that I should look > >closely. I've got only .CEL files and I have no idea what else I can do. > >QCReport is available here: > > > >http://students.mimuw.edu.pl/%7Ejm214641/AffyQCReport.pdf > > > >On RLE and RNAdeg plots I can't distinguish 4 "outliers" from rest. > > > >How can I check what was measured (background or signal)? Should I use > >P/M/A method or something different? Are there any other Quality Control > >methods than QCReport, RLE, NUSE and image analysis (residuals, > >weigths). Maybe, in this situation, some pre-processing methods are > >better than another? Maybe linear transformation can help? > >Thank You, > >Kuba > > > >Sean Davis pisze: > > > > > > > > > On Dec 17, 2007 5:28 PM, Jakub Mieczkowski <kubamieczkowski at="" op.pl=""> > > <mailto:kubamieczkowski at="" op.pl="">> wrote: > > > > > > Hi All, > > > I'm new to Bioconductor and I want to analyse time course data (6 time > > > points, 3 oligo arrays in each). During the quality control > > (QCReport) I > > > found that 4 arrays have different densities. What is shown here: > > > > > > http://students.mimuw.edu.pl/~jm214641/BoxANDden.pdf > > > <http: students.mimuw.edu.pl="" %7ejm214641="" boxandden.pdf=""> > > > > > > Plot of NUSE shows differences too. Images of weights are a little bit > > > different form rest, but I can't notice any artefacts. > > > 3 of them, are from the same time point. > > > > > > Should I remove them from further analysis (differences can have > > > biological basis)? Or maybe I just can't use methods like RMA (because > > > of different distributions)? Do you have any suggestions? > > > > > > > > > Hi, Kuba. You will probably need to look closely at the QC information > > > on these arrays, but I would be concerned that these arrays didn't work > > > for one reason or another given the much lower intensities associate > > > with your four "outlier arrays". I do not think I would blindly apply > > > RMA to those arrays without getting a better sense of whether or not > > > they are measuring something and not just representing mostly background > > > signal. > > > > > > Sean > > > > > > > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 18.0 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.7 years ago

United States

Hi folks, I did not mean that one should look at nonlinearity of the main trend. If the RNA is really bad, the scatter can either fill the entire rectangle or the data on one array is up against the lower or upper boundary of the plot. Curvature should be fixable by normalization. Sorry for the misunderstanding. --Naomi At 03:35 PM 12/19/2007, Henrik Bengtsson wrote: >On 19/12/2007, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > > A plot that is often quite informative is log(exprs) vs log(exprs) > > for the unnormalized probes from replicate arrays (or just log(pm) vs > > log(pm)) . If the arrays are "good" the technical replicates have > > high correlation and are tightly clustered on the diagonal of this > > plot, and biological replicates are shaped more like an American > > football - not a bit more pointy at the extremes than an ellipse. > > > > Bad arrays are either much more scattered, do not show a diagonal > > trend or may be jammed into the upper or lower section of the plot. > >I can agree with saturation effects (and partly amount of scatter), >but *absolutely not* such things as non-linear discrepancies away from >diagonal on the logarithmic scale. > >If you plot data in (log(y1), log(y2)) and see nonlinearities, that is >very often due to the simple fact that you have taken the logarithmic >transform on signals that got a bit of offset ("background"). If you >instead plot (y1,y2) you'll often find that the data lie on a nice >straight line. The curvature comes from the fact that the line you >can fit through the data cloud does not pass through the origin (0,0). > >It is more common to discuss the above effects in a log-ratio >log-intensity plot, that is, rotate the data to (A,M) where that >M=log2(y1/y2) and A=log2(y1*y2)/2. Then data "should be" along M=0, >but the offset and the logarithmic transform will make it bend like a >banana. Roughly the same way as in (log(y1), log(y2)) just rotated >and rescaled. > >Now to the apparent noise levels in log-ratios: Even if you have no >banana shape in (A,M) you can still have offset in the data. This >happens when the offset is effectively the same (when taking into >account differences in scales). You can easily try this yourself. >Take data that is nice and straight along the M=0 line in an M vs A >plot. Then, go back to the intensity scale and add the same offset >'a' (say a=500) to both channels, i.e. y1' <- y1 + a and y2' <- y2 + >a, and calculate M'=log2(y1'/y2') and A'=log2(y1'*y2')/2. When you >plot (A',M') the data is still straight and along M'=0. However, we >do know there is offset because we added it! Ok, even "worse" if you >look at the spread of {M'} compared with the spread of {M}, you'll >find that M' is much "cleaner" - when you increase 'a' it goes from >being a "funnel", to a "American football", to a "lentil", and finally >it will be sucked up in a "black hole". > >In other words, evaluating quality by looking at the variance in M is >dangerous and deceptive, if you're not careful. If you think about >it, in the perfect world without offset but with noise, you're >log-ratios will/should have infinite variance for signals close to >zero, e.g. "log2(0/0)". (How to deal with this fact is a different >issue). > >To summarize, don't throw out samples/arrays just because their >(log(y1),log(y2)) or (A,M) plots look like a banana, or if their >log-ratios (M) blow up at lower log-intensities (A). Such effect can >be fixed by using the *correct* calibration/normalization. Microarray >experiments still cost money and RNA/DNA might be scarce. > >In order to stop myself from ranting more about this here, please read >the following instead: > >H. Bengtsson and O. H?ssjer, Methodological study of affine >transformations of gene expression data with proposed robust >non-parametric multi-dimensional normalization method, BMC >Bioinformatics, 2006, 7:100. >http://www.biomedcentral.com/1471-2105/7/100/ > >(It got references to other papers also dealing with this problem, >although they are less explicit about it) > >Cheers > >Henrik > > > > > --Naomi > > > > > > At 05:30 PM 12/18/2007, Jakub Mieczkowski wrote: > > >First of all thank you very much for response. > > >Unfortunately I don't understand what do you mean that I should look > > >closely. I've got only .CEL files and I have no idea what else I can do. > > >QCReport is available here: > > > > > >http://students.mimuw.edu.pl/%7Ejm214641/AffyQCReport.pdf > > > > > >On RLE and RNAdeg plots I can't distinguish 4 "outliers" from rest. > > > > > >How can I check what was measured (background or signal)? Should I use > > >P/M/A method or something different? Are there any other Quality Control > > >methods than QCReport, RLE, NUSE and image analysis (residuals, > > >weigths). Maybe, in this situation, some pre-processing methods are > > >better than another? Maybe linear transformation can help? > > >Thank You, > > >Kuba > > > > > >Sean Davis pisze: > > > > > > > > > > > > On Dec 17, 2007 5:28 PM, Jakub Mieczkowski <kubamieczkowski at="" op.pl=""> > > > <mailto:kubamieczkowski at="" op.pl="">> wrote: > > > > > > > > Hi All, > > > > I'm new to Bioconductor and I want to > analyse time course data (6 time > > > > points, 3 oligo arrays in each). During the quality control > > > (QCReport) I > > > > found that 4 arrays have different densities. What is shown here: > > > > > > > > http://students.mimuw.edu.pl/~jm214641/BoxANDden.pdf > > > > <http: students.mimuw.edu.pl="" %7ejm214641="" boxandden.pdf=""> > > > > > > > > Plot of NUSE shows differences too. > Images of weights are a little bit > > > > different form rest, but I can't notice any artefacts. > > > > 3 of them, are from the same time point. > > > > > > > > Should I remove them from further analysis (differences can have > > > > biological basis)? Or maybe I just > can't use methods like RMA (because > > > > of different distributions)? Do you have any suggestions? > > > > > > > > > > > > Hi, Kuba. You will probably need to look closely at the QC information > > > > on these arrays, but I would be concerned that these arrays didn't work > > > > for one reason or another given the much lower intensities associate > > > > with your four "outlier arrays". I do not think I would blindly apply > > > > RMA to those arrays without getting a better sense of whether or not > > > > they are measuring something and not just > representing mostly background > > > > signal. > > > > > > > > Sean > > > > > > > > > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor at stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 18.0 years ago Naomi Altman ★ 6.0k

Login before adding your answer.