Entering edit mode
Levi Waldron
▴
80
@levi-waldron-6357
Last seen 10.3 years ago
Depending on which Affy platform you have, you may also be able to use
Frozen RMA (bioc library frma), then it doesn't matter which option
you
choose.
On Wed, Jul 30, 2014 at 5:32 AM, Wolfgang Huber <whuber@embl.de>
wrote:
> Dear Bernarnd
>
> my preference would be option 2, but the first thing to do if you're
> unsure is to try both and see if it makes any difference. Presumably
the
> differences are minimal and within the uncertainty of your analysis.
>
> If option 2 were the right thing to do, then with the same logic you
could
> go out to the internet (ArrayExpress, GEO), download a few thousand
more
> arrays, throw them in, and get even better results.
>
> The view "purpose of normalization is to remove batch effects" is
not
> quite right, as batch effects can affect the data in all sorts of
ways, but
> e.g. rma only addresses those types of efffects that affect all the
data on
> an array in the same way, i.e. overall higher or lower background,
or
> overall more or less cDNA used, over overall longer or shorter
exposure to
> the scanner. What it does not remove is, for instance, if the way
that the
> signal depends on probe GC content or cDNA length changes (and this
can
> happen as reagents & material change).
>
> Best wishes
> Wolfgang
>
>
>
>
>
>
>
>
> Il giorno Jul 29, 2014, alle ore 20:32 EDT, Bernard Lee Kok Bang <
> bernard.lee@carif.com.my> ha scritto:
>
> > Dear all, I would like to ask a question in regards to microarray
data
> normalization.
> >
> > Scenario;
> > I have in hand a collection of 300 cancer cell lines (multiple
cancer
> types) raw '.CEL' files, all from the same study/batch. My aim is to
obtain
> the gene expression values and use them downstream. However I am
only
> interested in a subset of these .CEL files, for example I am only
> interested in NON-blood cancer cell lines (n=250).
> >
> > I'm wondering which of these two options is more appropriate for
my
> scenario:
> >
> > Option 1:
> > 1) Normalize all 300 .CEL by rma.
> > 2) After normalization, manually remove the 50 blood samples I
am NOT
> interested in
> > 3) Use the normalized data of 250 samples for downstream
analysis
> >
> > Option 2:
> > 1) Normalize ONLY the 250 .CEL by rma (imagine as if the 50
blood
> samples does not exists)
> > 2) Use the normalized data of 250 samples for downstream
analysis
> >
> > My downstream analysis simply involves ranking the gene from
highest
> expression to the lowest.
> >
> >> From my point of view, I am favoring the first option. This is
because
> since I have all the solid tumor and blood cell line data, I might
as well
> normalized them altogether first before manually excluding the blood
cell
> line, as to my knowledge the purpose of normalization is to remove
batch
> effects?? So the larger the sample size during rma normalization the
> better??
> >
> >
> > Thanks in advance.
> >
> > Bernard Lee
> > Research Assistant
> > Cancer Research Initiatives Foundation (CARIF)
> > University of Malaya (UM)
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Levi Waldron
Assistant Professor of Biostatistics
City University of New York School of Public Health, Hunter College
2180 3rd Ave Rm 538
New York NY 10035-4003
phone: 212-396-7747
www.waldronlab.org
[[alternative HTML version deleted]]