Alex Tsoi wrote:
> Dear all,
>
> I have a cancer dataset from GEO that labeled as having the platform
GPL 91
> (HG-U95A), and when I use justRMA() to read the data, I realize that
the
> GSMs are from HG_U95A and HG_U95Av2, and that gives me the error. I
could
> separately analyze the data but I just want to ask if anyone has
experience
> or comments about the difference between the two platforms AND could
I seem
> the data coming from one platform, and analyze them (eg. by using
RMA); of
> course if that's the case I have to "make" R believe that they are
coming
> from only one platform. Or what's the most proper way to analyze
these kinds
> of data ?
>
> Greatly appreciate for the help and the comments
>
> P.S.: this is a cancer dataset, with two types of disease state, and
each
> type could be either come from the HG-U95A or HG_U95Av2
>
>
This is a difficult problem since there are platform specific effects.
For example, you might think that a probeset which is shared between
the
two platforms would be safe to compare, but unfortunately, it will
behave slightly differently on one platform than on the other. Even
though in theory this is measuring the same thing.
You could start by just normalizing these two array types in separate
pools. Then you could take probesets that are supposedly shared
between
them and look to see how they are behaving in their respective
conditions. In general, I expect you will find that shared probesets
to
move the same direction on each platform under your experimental
conditions, but that you get different absolute results on one
platform
than on another for a given condition. In other words, both the
condition and the platform will contribute to the overall signal. The
easiest thing is always to look at one platform at a time, but if you
*must* combine them, grab a statistician 1st to try and help you to do
something sensible.
good luck,
Marc
My take on the matter (from the very distant past, mind you, so none
of
the packaged material there will work now or with any recent version
of
the software) is here:
http://bmbolstad.com/misc/mixtureCDF/MixtureCDF.html
The key point there being that the differences between U95A and U95Av2
are fairly small (there is only a relative handful of probesets which
differ between the two, I think something like 25 out of ~12600).
Best,
Ben
On Wed, 2007-07-18 at 18:27 -0400, Alex Tsoi wrote:
> Dear all,
>
> I have a cancer dataset from GEO that labeled as having the platform
GPL 91
> (HG-U95A), and when I use justRMA() to read the data, I realize that
the
> GSMs are from HG_U95A and HG_U95Av2, and that gives me the error. I
could
> separately analyze the data but I just want to ask if anyone has
experience
> or comments about the difference between the two platforms AND could
I seem
> the data coming from one platform, and analyze them (eg. by using
RMA); of
> course if that's the case I have to "make" R believe that they are
coming
> from only one platform. Or what's the most proper way to analyze
these kinds
> of data ?
>
> Greatly appreciate for the help and the comments
>
> P.S.: this is a cancer dataset, with two types of disease state, and
each
> type could be either come from the HG-U95A or HG_U95Av2
>
On Jul 19, 2007, at 6:00 AM, bioconductor-request at stat.math.ethz.ch
wrote:
> Alex Tsoi wrote:
>> Dear all,
>>
>> I have a cancer dataset from GEO that labeled as having the
>> platform GPL 91
>> (HG-U95A), and when I use justRMA() to read the data, I realize
>> that the
>> GSMs are from HG_U95A and HG_U95Av2, and that gives me the error.
>> I could
>> separately analyze the data but I just want to ask if anyone has
>> experience
>> or comments about the difference between the two platforms AND
>> could I seem
>> the data coming from one platform, and analyze them (eg. by using
>> RMA); of
>> course if that's the case I have to "make" R believe that they are
>> coming
>> from only one platform. Or what's the most proper way to analyze
>> these kinds
>> of data ?
>>
>> Greatly appreciate for the help and the comments
>>
>> P.S.: this is a cancer dataset, with two types of disease state,
>> and each
>> type could be either come from the HG-U95A or HG_U95Av2
>>
>>
> This is a difficult problem since there are platform specific
effects.
> For example, you might think that a probeset which is shared
> between the
> two platforms would be safe to compare, but unfortunately, it will
> behave slightly differently on one platform than on the other. Even
> though in theory this is measuring the same thing.
>
> You could start by just normalizing these two array types in
separate
> pools. Then you could take probesets that are supposedly shared
> between
> them and look to see how they are behaving in their respective
> conditions. In general, I expect you will find that shared
> probesets to
> move the same direction on each platform under your experimental
> conditions, but that you get different absolute results on one
> platform
> than on another for a given condition. In other words, both the
> condition and the platform will contribute to the overall signal.
The
> easiest thing is always to look at one platform at a time, but if
you
> *must* combine them, grab a statistician 1st to try and help you to
do
> something sensible.
I would begin as Marc suggests, and then explore integrative
correlation as a way to identify a reproducible set in 2 or more
studies (platforms). See the MergeMaid package and the references
therein. Once you have identified a reproducible set, you may want
to explore the packages metaArray and GeneMeta.
Rob