Hi!
Over the last few days we've been learning lots about alternate ways
of dealing with low-intesity probesets and some pretty strong
arguments in favour of using alternate techniques to deal with these.
Firstly, thanks - the discussion has been really helpful and much
appreciated!
These have now sparked a different question for us:
We have an ever-increasing database of affymetrix chips... Currently
these have been processed and normalised using MAS5.0. As we add
arrays to the set, we can compare between them since the normalisation
simply sets them to have the same average intensity.
So the question is, if I am to normalise my data with, RMA say, I get
a set of normalised arrays based on statistics generated over the set
of chips I normalise - i.e. each array is normalised in the context of
its peers, unlike MAS5.0 (as I understand it). This is, I think, due
to the a(j) parameter in the RMA model, or phi(j) for dChip which
represent the probe affinity effects and can be estimated if we have
'enough arrays' (from Irizarray et al. 2003, NA Res paper).
Now, when we add experiments to the database, are the normalised
expression levels calculated for one experimental chip-set comparable
to the expression-levels computed for another. if not, do I need to
apply RMA over the entire database each time I add a new experiment to
it? And is this possible in a reasonable amount of time and memory? If
not do people have alternate suggestions? We are particualrly
interested in clustering and generation of expression profiles...
Crispin
http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml
--------------------------------------------------------
This email is confidential and intended solely for the use of th...
{{dropped}}
if your data is decnent what you describe wont be that big an issue,
but here are various statergies to solve the problem you describe:
0- keep your cel files and redo everything every time (con: not
efficient
at all)
1- do rma on probe level. then before any expression level analysis
normalize the merged exprsets. (con: you may over-normalize)
2- decide on a "tyical probe level distribution" and alway map to that
(con: requires choice of a distribution and some extra coding)
3- use a non-multi array rma (ra?). you bg correct, use a non
multichip normalization such as rescaling (can vsn be made mono-chip?)
use robust summary, e.g. median, tukey.biweight, etc...
(con: under my defition of a good expression measure: it wont be as
good
as rma but itll be better than mas 5.0)
to see how well this does you can put it through
affycomp.biostat.jhsph.edu
i would rank these stratergies: 2,1,3,0. to pick a
typical probe level distribution in strategy 2 i
would use as many arrays as possible. i would not use a parametric
distribution, such as normal, just for computational convinience.
On Wed, 4 Jun 2003,
Crispin Miller wrote:
> Hi!
> Over the last few days we've been learning lots about alternate ways
of dealing with low-intesity probesets and some pretty strong
arguments in favour of using alternate techniques to deal with these.
Firstly, thanks - the discussion has been really helpful and much
appreciated!
>
> These have now sparked a different question for us:
> We have an ever-increasing database of affymetrix chips... Currently
these have been processed and normalised using MAS5.0. As we add
arrays to the set, we can compare between them since the normalisation
simply sets them to have the same average intensity.
>
> So the question is, if I am to normalise my data with, RMA say, I
get a set of normalised arrays based on statistics generated over the
set of chips I normalise - i.e. each array is normalised in the
context of its peers, unlike MAS5.0 (as I understand it). This is, I
think, due to the a(j) parameter in the RMA model, or phi(j) for
dChip which represent the probe affinity effects and can be estimated
if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper).
>
> Now, when we add experiments to the database, are the normalised
expression levels calculated for one experimental chip-set comparable
to the expression-levels computed for another. if not, do I need to
apply RMA over the entire database each time I add a new experiment to
it? And is this possible in a reasonable amount of time and memory? If
not do people have alternate suggestions? We are particualrly
interested in clustering and generation of expression profiles...
>
> Crispin
> http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml
>
> --------------------------------------------------------
>
>
> This email is confidential and intended solely for the use of th...
{{dropped}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
Hi all
Rafa wrote:
> 3- use a non-multi array rma (ra?). you bg correct, use a non
> multichip normalization such as rescaling (can vsn be made mono-
chip?)
vsn is a multichip method, it cannot be used on a single chip. With
some
modification to the code, it could be used to normalize one or several
additional chips against an existing batch of chips.
Best regards
Wolfgang
Hi Rafael,
I was just wondering if you could give me your opinion on my method of
normalization. I was always under the impression that it is best to
always renormalize the entire data set whenever you add or remove an
additional chip. This would correspond to your 0 method. I do
understand that this is the most time consuming method, but I have
created a visual basic interface that keeps track of all the .cel
files we have for our lab.
So, at any point you wish to have a different group of files to
analyze, it is a matter of clicking on the data sets you wish to
include, and from here we normalize everything together from the .cel
files using rma. It is usually a matter of minutes to have everything
renormalized together, and we currently have a collection of about 250
affy chips so far that can be combined together in any combination.
I thought this was the most precise way of creating normalized data
sets, but are the other methods you talked about better and more
accurate?
Thanks,
Richard Park
Computational Data Analyzer
Joslin Diabetes Center
-----Original Message-----
From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu]
Sent: Wednesday, June 04, 2003 10:53 AM
To: Crispin Miller
Cc: Bioconductor (E-mail)
Subject: Re: [BioC] Adding chips to an existing set of normalised data
if your data is decnent what you describe wont be that big an issue,
but here are various statergies to solve the problem you describe:
0- keep your cel files and redo everything every time (con: not
efficient
at all)
1- do rma on probe level. then before any expression level analysis
normalize the merged exprsets. (con: you may over-normalize)
2- decide on a "tyical probe level distribution" and alway map to that
(con: requires choice of a distribution and some extra coding)
3- use a non-multi array rma (ra?). you bg correct, use a non
multichip normalization such as rescaling (can vsn be made mono-chip?)
use robust summary, e.g. median, tukey.biweight, etc...
(con: under my defition of a good expression measure: it wont be as
good
as rma but itll be better than mas 5.0)
to see how well this does you can put it through
affycomp.biostat.jhsph.edu
i would rank these stratergies: 2,1,3,0. to pick a
typical probe level distribution in strategy 2 i
would use as many arrays as possible. i would not use a parametric
distribution, such as normal, just for computational convinience.
On Wed, 4 Jun 2003,
Crispin Miller wrote:
> Hi!
> Over the last few days we've been learning lots about alternate ways
of dealing with low-intesity probesets and some pretty strong
arguments in favour of using alternate techniques to deal with these.
Firstly, thanks - the discussion has been really helpful and much
appreciated!
>
> These have now sparked a different question for us:
> We have an ever-increasing database of affymetrix chips... Currently
these have been processed and normalised using MAS5.0. As we add
arrays to the set, we can compare between them since the normalisation
simply sets them to have the same average intensity.
>
> So the question is, if I am to normalise my data with, RMA say, I
get a set of normalised arrays based on statistics generated over the
set of chips I normalise - i.e. each array is normalised in the
context of its peers, unlike MAS5.0 (as I understand it). This is, I
think, due to the a(j) parameter in the RMA model, or phi(j) for
dChip which represent the probe affinity effects and can be estimated
if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper).
>
> Now, when we add experiments to the database, are the normalised
expression levels calculated for one experimental chip-set comparable
to the expression-levels computed for another. if not, do I need to
apply RMA over the entire database each time I add a new experiment to
it? And is this possible in a reasonable amount of time and memory? If
not do people have alternate suggestions? We are particualrly
interested in clustering and generation of expression profiles...
>
> Crispin
> http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml
>
> --------------------------------------------------------
>
>
> This email is confidential and intended solely for the use of th...
{{dropped}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
i think approach 0 is theoretically the best.
the only reason i ranked it as works is because of
how time consuming it is. of course, if someone has time and expertise
to
code a visual basic interface that handles 250 chips in "a matter of
minutes" then i would re-rank this approach as my favorite.
On Wed, 4
Jun 2003, Park, Richard wrote:
> Hi Rafael,
> I was just wondering if you could give me your opinion on my method
of
normalization. I was always under the impression that it is best to
always
renormalize the entire data set whenever you add or remove an
additional
chip. This would correspond to your 0 method. I do understand that
this is
the most time consuming method, but I have created a visual basic
interface that keeps track of all the .cel files we have for our lab.
>
> So, at any point you wish to have a different group of files to
analyze,
it is a matter of clicking on the data sets you wish to include, and
from
here we normalize everything together from the .cel files using rma.
It is
usually a matter of minutes to have everything renormalized together,
and
we currently have a collection of about 250 affy chips so far that can
be
combined together in any combination. >
> I thought this was the most precise way of creating normalized data
sets, but are the other methods you talked about better and more
accurate?
>
> Thanks,
> Richard Park
> Computational Data Analyzer
> Joslin Diabetes Center
>
> -----Original Message-----
> From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu]
> Sent: Wednesday, June 04, 2003 10:53 AM
> To: Crispin Miller
> Cc: Bioconductor (E-mail)
> Subject: Re: [BioC] Adding chips to an existing set of normalised
data
>
>
> if your data is decnent what you describe wont be that big an issue,
> but here are various statergies to solve the problem you describe:
>
> 0- keep your cel files and redo everything every time (con: not
efficient
> at all)
> 1- do rma on probe level. then before any expression level analysis
> normalize the merged exprsets. (con: you may over-normalize)
> 2- decide on a "tyical probe level distribution" and alway map to
that
> (con: requires choice of a distribution and some extra coding)
> 3- use a non-multi array rma (ra?). you bg correct, use a non
> multichip normalization such as rescaling (can vsn be made mono-
chip?)
> use robust summary, e.g. median, tukey.biweight, etc...
> (con: under my defition of a good expression measure: it wont be as
good
> as rma but itll be better than mas 5.0)
> to see how well this does you can put it through
> affycomp.biostat.jhsph.edu
>
> i would rank these stratergies: 2,1,3,0. to pick a
> typical probe level distribution in strategy 2 i
> would use as many arrays as possible. i would not use a parametric
> distribution, such as normal, just for computational convinience.
>
>
> On Wed, 4 Jun 2003,
> Crispin Miller wrote:
>
> > Hi!
> > Over the last few days we've been learning lots about alternate
ways of dealing with low-intesity probesets and some pretty strong
arguments in favour of using alternate techniques to deal with these.
Firstly, thanks - the discussion has been really helpful and much
appreciated!
> >
> > These have now sparked a different question for us:
> > We have an ever-increasing database of affymetrix chips...
Currently these have been processed and normalised using MAS5.0. As we
add arrays to the set, we can compare between them since the
normalisation simply sets them to have the same average intensity.
> >
> > So the question is, if I am to normalise my data with, RMA say, I
get a set of normalised arrays based on statistics generated over the
set of chips I normalise - i.e. each array is normalised in the
context of its peers, unlike MAS5.0 (as I understand it). This is, I
think, due to the a(j) parameter in the RMA model, or phi(j) for
dChip which represent the probe affinity effects and can be estimated
if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper).
> >
> > Now, when we add experiments to the database, are the normalised
expression levels calculated for one experimental chip-set comparable
to the expression-levels computed for another. if not, do I need to
apply RMA over the entire database each time I add a new experiment to
it? And is this possible in a reasonable amount of time and memory? If
not do people have alternate suggestions? We are particualrly
interested in clustering and generation of expression profiles...
> >
> > Crispin
> > http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml
> >
> > --------------------------------------------------------
> >
> >
> > This email is confidential and intended solely for the use of
th... {{dropped}}
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>