Hi there,
I have a question about limmaGUI. How are spot quality measures used?
I am
wondering because I am using a new spot finding package that generates
confidence values on a per spot basis. Can these be used when loading
the
data? How will they be used?
Any help is much appreciated,
Liz Brooke-Powell
Molteno Building
Department of Pathology
University of Cambridge
Tennis Court Road
Cambridge, CB2 1QP
United Kingdom
Website: http://www.path.cam.ac.uk/~toxo/
Tel 01223 33 33 31(office) or 01223 33 33 29 (lab)
[[alternative HTML version deleted]]
Hi Liz,
limmaGUI is not as flexible as limma when it comes to spot
quality measures for "new spot finding packages". Please tell
us the column name(s) from your raw image-analysis results files
which you want to use for assessing quality, and if you can
explain what the quality indicator in this column means (e.g.
high=good, low=bad, ...), that would be even better.
Try the limmaGUI spot-quality-weighting option for GenePix.
(Even if you don't have any GenePix files, you can just
pretend you do have GenePix files in order to see the
spot-quality weighting dialog.) You can give different weights
to different GenePix flags (for "bad" spots or "not found"
spots etc.) Is this the sort of thing you are looking for?
The extra quality column(s) are read in when the raw data is
read in, and then they are used to form weights in the
normalization routines in limma.
Type:
?normalizeWithinArrays
OR
?wtflags (not as flexible as the limmaGUI GenePix flags dialog)
at the R prompt for a bit more information.
Regards,
James
On Fri, 2 Jul 2004, Elizabeth Brooke-Powell wrote:
> Hi there,
>
> I have a question about limmaGUI. How are spot quality measures
used? I am
> wondering because I am using a new spot finding package that
generates
> confidence values on a per spot basis. Can these be used when
loading the
> data? How will they be used?
>
> Any help is much appreciated,
>
> Liz Brooke-Powell
>
> Molteno Building
> Department of Pathology
> University of Cambridge
> Tennis Court Road
> Cambridge, CB2 1QP
> United Kingdom
>
> Website: http://www.path.cam.ac.uk/~toxo/
> Tel 01223 33 33 31(office) or 01223 33 33 29 (lab)
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
--
----------------------------------------------------------------------
----
James Wettenhall Tel: (+61 3) 9345
2629
Division of Genetics and Bioinformatics Fax: (+61 3) 9347
0852
The Walter & Eliza Hall Institute E-mail:
wettenhall@wehi.edu.au
of Medical Research, Mobile: (+61 / 0 ) 438 527
921
1G Royal Parade,
Parkville, Vic 3050, Australia
http://www.wehi.edu.au
Hi James,
The confidence values are give in numbers as decimals with 1 = 100%
confident (e.g. confidence value = 0.78) this is a value determined
using
Bayesian statistics and is a measure of how confident the package is
that
the spot it found is real. The package itself (BlueFuse only currently
available in the UK) uses a Bayesian model to iteratively find spots
looking. I don't know much more as it's protected, and I'm a
biologist.
Basically I am asking if the model can take account of these numbers
and
adjust the model appropriately. I am not sure in this case that
pretending
to have GenePix will work as the numbers are not a simple 0 or 1 (good
or
bad). If I was to try this, do I need to format the txt file of data
to look
like a GenePix file?
Thanks for you help,
Liz
-----Original Message-----
From: James Wettenhall [mailto:wettenhall@wehi.edu.au]
Sent: 02 July 2004 14:02
To: Elizabeth Brooke-Powell
Cc: bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] LimmaGUI Spot Quality
Hi Liz,
limmaGUI is not as flexible as limma when it comes to spot
quality measures for "new spot finding packages". Please tell
us the column name(s) from your raw image-analysis results files
which you want to use for assessing quality, and if you can
explain what the quality indicator in this column means (e.g.
high=good, low=bad, ...), that would be even better.
Try the limmaGUI spot-quality-weighting option for GenePix.
(Even if you don't have any GenePix files, you can just
pretend you do have GenePix files in order to see the
spot-quality weighting dialog.) You can give different weights
to different GenePix flags (for "bad" spots or "not found"
spots etc.) Is this the sort of thing you are looking for?
The extra quality column(s) are read in when the raw data is
read in, and then they are used to form weights in the
normalization routines in limma.
Type:
?normalizeWithinArrays
OR
?wtflags (not as flexible as the limmaGUI GenePix flags dialog)
at the R prompt for a bit more information.
Regards,
James
----------------------------------------------------------------------
----
James Wettenhall Tel: (+61 3) 9345
2629
Division of Genetics and Bioinformatics Fax: (+61 3) 9347
0852
The Walter & Eliza Hall Institute E-mail:
wettenhall@wehi.edu.au
of Medical Research, Mobile: (+61 / 0 ) 438 527
921
1G Royal Parade,
Parkville, Vic 3050, Australia
http://www.wehi.edu.au
Liz,
On Fri, 2 Jul 2004, Elizabeth Brooke-Powell wrote:
> adjust the model appropriately. I am not sure in this case that
pretending
> to have GenePix will work as the numbers are not a simple 0 or 1
(good or
No, sorry I didn't mean to imply that you would be able to just
use the GenePix option in limmaGUI as is. I just thought it
might by helpful for you to learn how weights can be defined
(for _GenePix_ data), based on GenePix spot flags. Notice that
the weights we define for the GenePix flags are between 0 and
1, just as your "quality weights" already are. But after we
process GenePix data, the number of _different_ values in the
weights column would be small, e.g. in this weights vector:
(1,1,1,1,0.1,1,1,1,1,1,1,0,1,1,1,1,1,0.1,0.1,1,1,1,1,1),
there are only three _different_ weight values (0, 0.1 and 1),
whereas for your data, the column of weights (between 0 and 1)
could contain lots of different weight values between 0 and 1
for the different genes.
I don't think you have told us the column name of this
quality weight yet.
Maybe you should ask the statistician who designed this
quality weighting how he/she intended that it be used in
normalization. But it can probably be used directly in
limma's normalization, and all you would have to do is tell us
the appropriate column names which limma would need to read in
for your data (Rf, Rb, Gf, Gb and Spot-Quality-Weighting) and
then we can add an option to limma/limmaGUI to allow it to read
in the appropriate columns for BlueFuse including the quality
weights.
There are no plans at the moment to add a custom-dialog to
limmaGUI for reading in an arbitrary column of weights from your
raw image-analysis files. But if you want to start combining
the command-line interface with the GUI interface, you could
read the weights into RG$weights in limmaGUIenvironment. Then
they would be automatically used for normalization.
(1) From the R console :
RG <- get("RG",envir=limmaGUIenvironment)
names(RG)
RG$weights <- ...
names(RG)
assign("RG",RG,limmaGUIenvironment)
OR
(2) From the "Evaluate R Code menu:
RG$weights <- ...
(In case (2), when using the "Evaluate R Code" menu, your R
commands are automatically evaluated in limmaGUIenvironment
which contains all of your microarray data objects used by
limmaGUI.)
Regards,
James
Sorry James,
Here are the columns titles:
ROW
COL
SUBGRIDROW
SUBGRIDCOL
SPOTNUM
BLOCK
NAME
ID
CONFIDENCE
FLAG
MAN EXCL
AMPCH1
AMPCH2
RATIO CH1/CH2
LOG2RATIO CH1/CH2
LOG10RATIO CH1/CH2
RATIO CH2/CH1
LOG2RATIO CH2/CH1
LOG10RATIO CH2/CH1
SUM
PELROW
PELCOL
I have previously used the other function in LimmaGUI and used AMPCH1
and
AMPCH2 as the signal channels, there is no background data as the
background
is taken account of in the model. The column labelled CONFIDENCE is
obviously the one in question.
Thanks for your help,
Liz
-----Original Message-----
From: James Wettenhall [mailto:wettenhall@wehi.edu.au]
Sent: 02 July 2004 16:42
To: Elizabeth Brooke-Powell
Cc: bioconductor@stat.math.ethz.ch
Subject: RE: [BioC] LimmaGUI Spot Quality
Liz,
On Fri, 2 Jul 2004, Elizabeth Brooke-Powell wrote:
> adjust the model appropriately. I am not sure in this case that
pretending
> to have GenePix will work as the numbers are not a simple 0 or 1
(good or
No, sorry I didn't mean to imply that you would be able to just
use the GenePix option in limmaGUI as is. I just thought it
might by helpful for you to learn how weights can be defined
(for _GenePix_ data), based on GenePix spot flags. Notice that
the weights we define for the GenePix flags are between 0 and
1, just as your "quality weights" already are. But after we
process GenePix data, the number of _different_ values in the
weights column would be small, e.g. in this weights vector:
(1,1,1,1,0.1,1,1,1,1,1,1,0,1,1,1,1,1,0.1,0.1,1,1,1,1,1),
there are only three _different_ weight values (0, 0.1 and 1),
whereas for your data, the column of weights (between 0 and 1)
could contain lots of different weight values between 0 and 1
for the different genes.
I don't think you have told us the column name of this
quality weight yet.
Maybe you should ask the statistician who designed this
quality weighting how he/she intended that it be used in
normalization. But it can probably be used directly in
limma's normalization, and all you would have to do is tell us
the appropriate column names which limma would need to read in
for your data (Rf, Rb, Gf, Gb and Spot-Quality-Weighting) and
then we can add an option to limma/limmaGUI to allow it to read
in the appropriate columns for BlueFuse including the quality
weights.
There are no plans at the moment to add a custom-dialog to
limmaGUI for reading in an arbitrary column of weights from your
raw image-analysis files. But if you want to start combining
the command-line interface with the GUI interface, you could
read the weights into RG$weights in limmaGUIenvironment. Then
they would be automatically used for normalization.
(1) From the R console :
RG <- get("RG",envir=limmaGUIenvironment)
names(RG)
RG$weights <- ...
names(RG)
assign("RG",RG,limmaGUIenvironment)
OR
(2) From the "Evaluate R Code menu:
RG$weights <- ...
(In case (2), when using the "Evaluate R Code" menu, your R
commands are automatically evaluated in limmaGUIenvironment
which contains all of your microarray data objects used by
limmaGUI.)
Regards,
James
At 11:13 PM 2/07/2004, Elizabeth Brooke-Powell wrote:
>Hi James,
>
>The confidence values are give in numbers as decimals with 1 = 100%
>confident (e.g. confidence value = 0.78) this is a value determined
using
>Bayesian statistics and is a measure of how confident the package is
that
>the spot it found is real. The package itself (BlueFuse only
currently
>available in the UK) uses a Bayesian model to iteratively find spots
>looking. I don't know much more as it's protected, and I'm a
biologist.
>
>Basically I am asking if the model can take account of these numbers
and
>adjust the model appropriately.
The answer is yes, in principle, but not without knowing how
BlueFuse's
"confidence value" is defined and what it means. Is the confidence
value a
probability? If so, of what? Is it a weight or an inverse variance? If
so,
of what? How does "confidence value" interact with the FLAG column
included
in BlueFuse output? You might not be able to answer these questions
yourself but the BlueFuse developers can. I have not been able to find
technical information on the BlueFuse www site sufficient to answer
these
questions.
Without knowing anything further, I would be inclined to treat the
"confidence values" directly as weights in limma normalization and
differential expression analyses. This is simple to do in principle,
but it
is not clear now to read the data in. The BlueFuse format is different
to
that of other two color image analysis programs. Is the RATIO column
in the
BlueFuse output the same as AMPCH1 divided by AMPCH2? If not, what are
AMPCH1 and AMPCH2? We need to know this.
Gordon
> I am not sure in this case that pretending
>to have GenePix will work as the numbers are not a simple 0 or 1
(good or
>bad). If I was to try this, do I need to format the txt file of data
to look
>like a GenePix file?
>
>Thanks for you help,
>
>Liz
----------------------------------------------------------------------
-----------------
Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3050, Australia
Tel: (03) 9345 2326, Fax (03) 9347 0852,
Email: smyth@wehi.edu.au, www: http://www.statsci.org
Graham,
Many thanks for this further info.
I am taking from your remarks on AmpCh1 and AmpCh2 that we can read
columns
into R and ignore the various ratio columns as these can be re-
computed
from AmpCh1 and AmpCh2.
You are describing the "confidence estimate" as as an intuitive
measure. I
understand the need for something intuitive. Unfortunately for use in
numerical calculations we need a measure which is quantitatively
related to
something, e.g., is quantitatively related to the estimated variance
of the
log-ratio is some way.
Gordon
At 12:45 AM 4/07/2004, Graham Snudden wrote:
>Gordon,
>
>To pick up the points raised in the mail below.
>
>1. The confidence estimate to which Liz refers is derived from the
posterior
>distributions returned by the Bayesian framework that we are using to
>estimate the biological signal at each spot location. The underlying
>framework is relatively complex and provides a number of metrics
relating to
>the signal in each channel. In order to simplify these metrics and
make them
>intuitive to the end user (biologists) we generate a single
confidence
>estimate. This estimate reflects the distribution of the ratio, i.e.
how
>confident are we in the value calculated for the ratio. In most cases
a
>tight signal distribution in each channel will lead to a high
confidence
>however it is possible that a very broad distribution in one channel
- a
>weak, or saturated, spot - and a very tight distribution in the other
will
>also lead to a low confidence. A positive control, with near zero
signal in
>one channel, will therefore return a low confidence reflecting the
high
>degree of ambiguity in the actual value returned for the ratio. The
>associated confidence flag is derived from the confidence estimate by
a
>simple lookup table which is under user control. This is described on
the
>website if you follow the 'colour coded confidence flags' link on the
>product page; http://www.cambridgebluegnome.com/products/index.htm.
>
>2. The AmpCh1 and AmpCh2 columns return our estimate of the total
signal in
>each channel. Clearly as we are not thresholding out an area of
signal we
>have no concept of mean or median pixel intensity neither do we need
to
>perform background subtraction as the amount of signal per spot is
returned
>by the underlying models independent of any noise processes. The
ratio is
>the ratio between the two channels.
>If you need additional technical information I could put you in touch
with
>our academic founders out of the signal processing lab here in
Cambridge.
>Clearly we are using the very different approach to more traditional
>threshold/template based solutions however our experience is that the
>Bayesian approach offers significant advantages in terms of
robustness,
>automation, detection, accuracy and, as described above, confidence
>estimation.
>
>Best regards
>
>
>Graham Snudden
>VP Engineering
>BlueGnome Ltd
>
>
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth@wehi.edu.au]
>Sent: 02 July 2004 23:33
>To: Elizabeth Brooke-Powell
>Cc: 'James Wettenhall'; bioconductor@stat.math.ethz.ch;
>info@cambridgebluegnome.com
>Subject: RE: [BioC] LimmaGUI Spot Quality
>
>At 11:13 PM 2/07/2004, Elizabeth Brooke-Powell wrote:
> >Hi James,
> >
> >The confidence values are give in numbers as decimals with 1 = 100%
> >confident (e.g. confidence value = 0.78) this is a value determined
using
> >Bayesian statistics and is a measure of how confident the package
is that
> >the spot it found is real. The package itself (BlueFuse only
currently
> >available in the UK) uses a Bayesian model to iteratively find
spots
> >looking. I don't know much more as it's protected, and I'm a
biologist.
> >
> >Basically I am asking if the model can take account of these
numbers and
> >adjust the model appropriately.
>
>The answer is yes, in principle, but not without knowing how
BlueFuse's
>"confidence value" is defined and what it means. Is the confidence
value a
>probability? If so, of what? Is it a weight or an inverse variance?
If so,
>of what? How does "confidence value" interact with the FLAG column
included
>in BlueFuse output? You might not be able to answer these questions
>yourself but the BlueFuse developers can. I have not been able to
find
>technical information on the BlueFuse www site sufficient to answer
these
>questions.
>
>Without knowing anything further, I would be inclined to treat the
>"confidence values" directly as weights in limma normalization and
>differential expression analyses. This is simple to do in principle,
but it
>is not clear now to read the data in. The BlueFuse format is
different to
>that of other two color image analysis programs. Is the RATIO column
in the
>BlueFuse output the same as AMPCH1 divided by AMPCH2? If not, what
are
>AMPCH1 and AMPCH2? We need to know this.
>
>Gordon
>
> > I am not sure in this case that pretending
> >to have GenePix will work as the numbers are not a simple 0 or 1
(good or
> >bad). If I was to try this, do I need to format the txt file of
data to
>look
> >like a GenePix file?
> >
> >Thanks for you help,
> >
> >Liz
>
>---------------------------------------------------------------------
-------
>-----------
>Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics,
>Walter and Eliza Hall Institute of Medical Research,
>1G Royal Parade, Parkville, Vic 3050, Australia
>Tel: (03) 9345 2326, Fax (03) 9347 0852,
>Email: smyth@wehi.edu.au, www: http://www.statsci.org