Hi folks,
I did not mean that one should look at
nonlinearity of the main trend. If the RNA is
really bad, the scatter can either fill the
entire rectangle or the data on one array is up
against the lower or upper boundary of the
plot. Curvature should be fixable by normalization.
Sorry for the misunderstanding.
--Naomi
At 03:35 PM 12/19/2007, Henrik Bengtsson wrote:
>On 19/12/2007, Naomi Altman <naomi at="" stat.psu.edu=""> wrote:
> > A plot that is often quite informative is log(exprs) vs log(exprs)
> > for the unnormalized probes from replicate arrays (or just log(pm)
vs
> > log(pm)) . If the arrays are "good" the technical replicates have
> > high correlation and are tightly clustered on the diagonal of this
> > plot, and biological replicates are shaped more like an American
> > football - not a bit more pointy at the extremes than an ellipse.
> >
> > Bad arrays are either much more scattered, do not show a diagonal
> > trend or may be jammed into the upper or lower section of the
plot.
>
>I can agree with saturation effects (and partly amount of scatter),
>but *absolutely not* such things as non-linear discrepancies away
from
>diagonal on the logarithmic scale.
>
>If you plot data in (log(y1), log(y2)) and see nonlinearities, that
is
>very often due to the simple fact that you have taken the logarithmic
>transform on signals that got a bit of offset ("background"). If you
>instead plot (y1,y2) you'll often find that the data lie on a nice
>straight line. The curvature comes from the fact that the line you
>can fit through the data cloud does not pass through the origin
(0,0).
>
>It is more common to discuss the above effects in a log-ratio
>log-intensity plot, that is, rotate the data to (A,M) where that
>M=log2(y1/y2) and A=log2(y1*y2)/2. Then data "should be" along M=0,
>but the offset and the logarithmic transform will make it bend like a
>banana. Roughly the same way as in (log(y1), log(y2)) just rotated
>and rescaled.
>
>Now to the apparent noise levels in log-ratios: Even if you have no
>banana shape in (A,M) you can still have offset in the data. This
>happens when the offset is effectively the same (when taking into
>account differences in scales). You can easily try this yourself.
>Take data that is nice and straight along the M=0 line in an M vs A
>plot. Then, go back to the intensity scale and add the same offset
>'a' (say a=500) to both channels, i.e. y1' <- y1 + a and y2' <- y2 +
>a, and calculate M'=log2(y1'/y2') and A'=log2(y1'*y2')/2. When you
>plot (A',M') the data is still straight and along M'=0. However, we
>do know there is offset because we added it! Ok, even "worse" if you
>look at the spread of {M'} compared with the spread of {M}, you'll
>find that M' is much "cleaner" - when you increase 'a' it goes from
>being a "funnel", to a "American football", to a "lentil", and
finally
>it will be sucked up in a "black hole".
>
>In other words, evaluating quality by looking at the variance in M is
>dangerous and deceptive, if you're not careful. If you think about
>it, in the perfect world without offset but with noise, you're
>log-ratios will/should have infinite variance for signals close to
>zero, e.g. "log2(0/0)". (How to deal with this fact is a different
>issue).
>
>To summarize, don't throw out samples/arrays just because their
>(log(y1),log(y2)) or (A,M) plots look like a banana, or if their
>log-ratios (M) blow up at lower log-intensities (A). Such effect can
>be fixed by using the *correct* calibration/normalization.
Microarray
>experiments still cost money and RNA/DNA might be scarce.
>
>In order to stop myself from ranting more about this here, please
read
>the following instead:
>
>H. Bengtsson and O. H?ssjer, Methodological study of affine
>transformations of gene expression data with proposed robust
>non-parametric multi-dimensional normalization method, BMC
>Bioinformatics, 2006, 7:100.
>
http://www.biomedcentral.com/1471-2105/7/100/
>
>(It got references to other papers also dealing with this problem,
>although they are less explicit about it)
>
>Cheers
>
>Henrik
>
> >
> > --Naomi
> >
> >
> > At 05:30 PM 12/18/2007, Jakub Mieczkowski wrote:
> > >First of all thank you very much for response.
> > >Unfortunately I don't understand what do you mean that I should
look
> > >closely. I've got only .CEL files and I have no idea what else I
can do.
> > >QCReport is available here:
> > >
> > >
http://students.mimuw.edu.pl/%7Ejm214641/AffyQCReport.pdf
> > >
> > >On RLE and RNAdeg plots I can't distinguish 4 "outliers" from
rest.
> > >
> > >How can I check what was measured (background or signal)? Should
I use
> > >P/M/A method or something different? Are there any other Quality
Control
> > >methods than QCReport, RLE, NUSE and image analysis (residuals,
> > >weigths). Maybe, in this situation, some pre-processing methods
are
> > >better than another? Maybe linear transformation can help?
> > >Thank You,
> > >Kuba
> > >
> > >Sean Davis pisze:
> > > >
> > > >
> > > > On Dec 17, 2007 5:28 PM, Jakub Mieczkowski <kubamieczkowski at="" op.pl=""> > > > <mailto:kubamieczkowski at="" op.pl="">> wrote:
> > > >
> > > > Hi All,
> > > > I'm new to Bioconductor and I want to
> analyse time course data (6 time
> > > > points, 3 oligo arrays in each). During the quality
control
> > > (QCReport) I
> > > > found that 4 arrays have different densities. What is
shown here:
> > > >
> > > >
http://students.mimuw.edu.pl/~jm214641/BoxANDden.pdf
> > > > <http: students.mimuw.edu.pl="" %7ejm214641="" boxandden.pdf="">
> > > >
> > > > Plot of NUSE shows differences too.
> Images of weights are a little bit
> > > > different form rest, but I can't notice any artefacts.
> > > > 3 of them, are from the same time point.
> > > >
> > > > Should I remove them from further analysis (differences
can have
> > > > biological basis)? Or maybe I just
> can't use methods like RMA (because
> > > > of different distributions)? Do you have any suggestions?
> > > >
> > > >
> > > > Hi, Kuba. You will probably need to look closely at the QC
information
> > > > on these arrays, but I would be concerned that these arrays
didn't work
> > > > for one reason or another given the much lower intensities
associate
> > > > with your four "outlier arrays". I do not think I would
blindly apply
> > > > RMA to those arrays without getting a better sense of whether
or not
> > > > they are measuring something and not just
> representing mostly background
> > > > signal.
> > > >
> > > > Sean
> > > >
> > > >
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor at stat.math.ethz.ch
> > >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >Search the archives:
> > >
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Naomi S. Altman 814-865-3791
(voice)
> > Associate Professor
> > Dept. of Statistics 814-863-7114
(fax)
> > Penn State University 814-865-1348
(Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111