Dear list member,
I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated into
unpaired
two groups of 5 arrays each. I processed the data using LIMMA and
dChip. For
dChip, I used all the default setting. The resulted differential
expressed
genes of the two have only less than 50% in common.
Why the number of the overlapped genes of the two results is so low?
Is there
any problems? Can anyone help me?
Thanks in advance,
Jun
Your question is bit vague and you provide little information. I do
not
think LIMMA has preprocessing capabilities for Affymetrix data.
1) How did you preprocess the data ?
2) How did you "analyse" your data in dChip ? What technique (e.g.
fold
change, t-test, wilcoxon) did you use in dChip ?
3) How did you select the differentially expressed genes ? (e.g. via
p-
value cutoff or biological significance).
One possibility is that you are using very different test statistics.
With 5 in each group, it is difficult to draw any conclusions as some
methods are more robust than others at small number of arrays.
Another is that you choose a threshold that includes a lot of noisy
gene. An extreme example is to select all genes with a p-value less
than
1 in which case you get 100% agreement between the two methods.
And yet another, you may have made a coding/programming error
somewhere.
Regards, Adai
On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote:
> Dear list member,
> I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated
into unpaired
> two groups of 5 arrays each. I processed the data using LIMMA and
dChip. For
> dChip, I used all the default setting. The resulted differential
expressed
> genes of the two have only less than 50% in common.
>
> Why the number of the overlapped genes of the two results is so low?
Is there
> any problems? Can anyone help me?
>
> Thanks in advance,
> Jun
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
We normalized the same data set using RMA and a very similar procedure
that
used Tukey's biweight within array to combine probes into gene
expression,
instead of median polish. We then applied 2-sample t-tests and SAM to
both
sets of data. The overlap in the "top 100" and "top 200" sets of
differentially expressed genes was 50%.
Normalization makes a huge difference, even though the correlation
between
the expression values, array by array, can be very close to 100%.
This has
been found many times. The recent thread "RMA vs gcRMA" sheds some
light
on this problem. I suspect that much of the difference lies in the
low
expressing genes - but this does not mean that these genes are
"absent".
--Naomi
At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote:
>Your question is bit vague and you provide little information. I do
not
>think LIMMA has preprocessing capabilities for Affymetrix data.
>
>1) How did you preprocess the data ?
>
>2) How did you "analyse" your data in dChip ? What technique (e.g.
fold
>change, t-test, wilcoxon) did you use in dChip ?
>
>3) How did you select the differentially expressed genes ? (e.g. via
p-
>value cutoff or biological significance).
>
>
>One possibility is that you are using very different test statistics.
>With 5 in each group, it is difficult to draw any conclusions as some
>methods are more robust than others at small number of arrays.
>
>Another is that you choose a threshold that includes a lot of noisy
>gene. An extreme example is to select all genes with a p-value less
than
>1 in which case you get 100% agreement between the two methods.
>
>And yet another, you may have made a coding/programming error
somewhere.
>
>Regards, Adai
>
>
>
>On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote:
> > Dear list member,
> > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated
into
> unpaired
> > two groups of 5 arrays each. I processed the data using LIMMA and
> dChip. For
> > dChip, I used all the default setting. The resulted differential
expressed
> > genes of the two have only less than 50% in common.
> >
> > Why the number of the overlapped genes of the two results is so
low? Is
> there
> > any problems? Can anyone help me?
> >
> > Thanks in advance,
> > Jun
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
> We normalized the same data set using RMA and a very similar
procedure
> that
> Normalization makes a huge difference, even though the correlation
between
> the expression values, array by array, can be very close to 100%.
This
> has
> been found many times. The recent thread "RMA vs gcRMA" sheds some
light
> on this problem. I suspect that much of the difference lies in the
low
> expressing genes - but this does not mean that these genes are
"absent".
I agree with Naomi, those low expressing genes might still present,
although the expressions are low. For RMA and GCRMA normalized data,
some
low expressing data also agree well, while there is also discrepancy
in
high expression part.
My confusion is what to do? Filtering genes with inconsistent result
from
RAM and GCRMA, or filtering genes with low intensities (MAS5 call?)
and
use one normalization result to draw conclusion?
Thanks!
Fangxin
> At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote:
>>Your question is bit vague and you provide little information. I do
not
>>think LIMMA has preprocessing capabilities for Affymetrix data.
>>
>>1) How did you preprocess the data ?
>>
>>2) How did you "analyse" your data in dChip ? What technique (e.g.
fold
>>change, t-test, wilcoxon) did you use in dChip ?
>>
>>3) How did you select the differentially expressed genes ? (e.g. via
p-
>>value cutoff or biological significance).
>>
>>
>>One possibility is that you are using very different test
statistics.
>>With 5 in each group, it is difficult to draw any conclusions as
some
>>methods are more robust than others at small number of arrays.
>>
>>Another is that you choose a threshold that includes a lot of noisy
>>gene. An extreme example is to select all genes with a p-value less
than
>>1 in which case you get 100% agreement between the two methods.
>>
>>And yet another, you may have made a coding/programming error
somewhere.
>>
>>Regards, Adai
>>
>>
>>
>>On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote:
>> > Dear list member,
>> > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated
into
>> unpaired
>> > two groups of 5 arrays each. I processed the data using LIMMA and
>> dChip. For
>> > dChip, I used all the default setting. The resulted differential
>> expressed
>> > genes of the two have only less than 50% in common.
>> >
>> > Why the number of the overlapped genes of the two results is so
low?
>> Is
>> there
>> > any problems? Can anyone help me?
>> >
>> > Thanks in advance,
>> > Jun
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor@stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor@stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
(Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
--
Fangxin Hong, Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong@salk.edu
What result do you get if you try and estimate how many are changing
and the
spearman rank correlation for that set?
This seems a more meaningful metric as up to 50% of genes in some
experiments maybe changing.
-----Original Message-----
From: Naomi Altman
To: ramasamy@cancer.org.uk; jun.yan.a@utoronto.ca
Cc: BioConductor mailing list
Sent: 3/13/05 6:00 PM
Subject: Re: [BioC] LIMMA vs. dChip
We normalized the same data set using RMA and a very similar procedure
that
used Tukey's biweight within array to combine probes into gene
expression,
instead of median polish. We then applied 2-sample t-tests and SAM to
both
sets of data. The overlap in the "top 100" and "top 200" sets of
differentially expressed genes was 50%.
Normalization makes a huge difference, even though the correlation
between
the expression values, array by array, can be very close to 100%.
This
has
been found many times. The recent thread "RMA vs gcRMA" sheds some
light
on this problem. I suspect that much of the difference lies in the
low
expressing genes - but this does not mean that these genes are
"absent".
--Naomi
At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote:
>Your question is bit vague and you provide little information. I do
not
>think LIMMA has preprocessing capabilities for Affymetrix data.
>
>1) How did you preprocess the data ?
>
>2) How did you "analyse" your data in dChip ? What technique (e.g.
fold
>change, t-test, wilcoxon) did you use in dChip ?
>
>3) How did you select the differentially expressed genes ? (e.g. via
p-
>value cutoff or biological significance).
>
>
>One possibility is that you are using very different test statistics.
>With 5 in each group, it is difficult to draw any conclusions as some
>methods are more robust than others at small number of arrays.
>
>Another is that you choose a threshold that includes a lot of noisy
>gene. An extreme example is to select all genes with a p-value less
than
>1 in which case you get 100% agreement between the two methods.
>
>And yet another, you may have made a coding/programming error
somewhere.
>
>Regards, Adai
>
>
>
>On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote:
> > Dear list member,
> > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated
into
> unpaired
> > two groups of 5 arrays each. I processed the data using LIMMA and
> dChip. For
> > dChip, I used all the default setting. The resulted differential
expressed
> > genes of the two have only less than 50% in common.
> >
> > Why the number of the overlapped genes of the two results is so
low?
Is
> there
> > any problems? Can anyone help me?
> >
> > Thanks in advance,
> > Jun
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
**********************************************************************
This email and any files transmitted with it are
confidentia...{{dropped}}
We did not do any further analysis, and we currently have no plans to
do
any. To really solve this, a properly designed experiment, possibly
WT
versus a well-understood knockout, should be done. The data we have
at
hand is not suitable to determine which normalization is best for
determining differential expression.
--Naomi
At 04:54 AM 3/14/2005, Stephen Henderson wrote:
>What result do you get if you try and estimate how many are changing
and the
>spearman rank correlation for that set?
>
>This seems a more meaningful metric as up to 50% of genes in some
>experiments maybe changing.
>
>
>
>-----Original Message-----
>From: Naomi Altman
>To: ramasamy@cancer.org.uk; jun.yan.a@utoronto.ca
>Cc: BioConductor mailing list
>Sent: 3/13/05 6:00 PM
>Subject: Re: [BioC] LIMMA vs. dChip
>
>We normalized the same data set using RMA and a very similar
procedure
>that
>used Tukey's biweight within array to combine probes into gene
>expression,
>instead of median polish. We then applied 2-sample t-tests and SAM
to
>both
>sets of data. The overlap in the "top 100" and "top 200" sets of
>differentially expressed genes was 50%.
>
>Normalization makes a huge difference, even though the correlation
>between
>the expression values, array by array, can be very close to 100%.
This
>has
>been found many times. The recent thread "RMA vs gcRMA" sheds some
>light
>on this problem. I suspect that much of the difference lies in the
low
>expressing genes - but this does not mean that these genes are
"absent".
>
>--Naomi
>
>At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote:
> >Your question is bit vague and you provide little information. I do
not
> >think LIMMA has preprocessing capabilities for Affymetrix data.
> >
> >1) How did you preprocess the data ?
> >
> >2) How did you "analyse" your data in dChip ? What technique (e.g.
fold
> >change, t-test, wilcoxon) did you use in dChip ?
> >
> >3) How did you select the differentially expressed genes ? (e.g.
via p-
> >value cutoff or biological significance).
> >
> >
> >One possibility is that you are using very different test
statistics.
> >With 5 in each group, it is difficult to draw any conclusions as
some
> >methods are more robust than others at small number of arrays.
> >
> >Another is that you choose a threshold that includes a lot of noisy
> >gene. An extreme example is to select all genes with a p-value less
>than
> >1 in which case you get 100% agreement between the two methods.
> >
> >And yet another, you may have made a coding/programming error
>somewhere.
> >
> >Regards, Adai
> >
> >
> >
> >On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote:
> > > Dear list member,
> > > I have a set of Affymetrix data of 10 arrays, HG_U133A,
seperated
>into
> > unpaired
> > > two groups of 5 arrays each. I processed the data using LIMMA
and
> > dChip. For
> > > dChip, I used all the default setting. The resulted differential
>expressed
> > > genes of the two have only less than 50% in common.
> > >
> > > Why the number of the overlapped genes of the two results is so
low?
>Is
> > there
> > > any problems? Can anyone help me?
> > >
> > > Thanks in advance,
> > > Jun
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor@stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
>*********************************************************************
*
>This email and any files transmitted with it are
confidentia...{{dropped}}
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
True but I think you maybe overstating the problem.
Differences in the tail are not biologically all that interesting. the
range
of rma is squashed compared to other methods, and tukey bi-weight has
an
unreliable baseline for low expressing values. The top100 is often a
small
fraction, and the tail maybe not that extreme.
The interesting point is whether all the data considered significant
by one
test is significant by the other and as you say how well correlated
the raw
data is.
I think??? I sometimes worry about this too.
S
-----Original Message-----
From: Naomi Altman
To: Stephen Henderson; 'ramasamy@cancer.org.uk ';
'jun.yan.a@utoronto.ca '
Cc: 'BioConductor mailing list '
Sent: 3/14/05 12:57 PM
Subject: RE: [BioC] LIMMA vs. dChip
We did not do any further analysis, and we currently have no plans to
do
any. To really solve this, a properly designed experiment, possibly
WT
versus a well-understood knockout, should be done. The data we have
at
hand is not suitable to determine which normalization is best for
determining differential expression.
--Naomi
At 04:54 AM 3/14/2005, Stephen Henderson wrote:
>What result do you get if you try and estimate how many are changing
and the
>spearman rank correlation for that set?
>
>This seems a more meaningful metric as up to 50% of genes in some
>experiments maybe changing.
>
>
>
>-----Original Message-----
>From: Naomi Altman
>To: ramasamy@cancer.org.uk; jun.yan.a@utoronto.ca
>Cc: BioConductor mailing list
>Sent: 3/13/05 6:00 PM
>Subject: Re: [BioC] LIMMA vs. dChip
>
>We normalized the same data set using RMA and a very similar
procedure
>that
>used Tukey's biweight within array to combine probes into gene
>expression,
>instead of median polish. We then applied 2-sample t-tests and SAM
to
>both
>sets of data. The overlap in the "top 100" and "top 200" sets of
>differentially expressed genes was 50%.
>
>Normalization makes a huge difference, even though the correlation
>between
>the expression values, array by array, can be very close to 100%.
This
>has
>been found many times. The recent thread "RMA vs gcRMA" sheds some
>light
>on this problem. I suspect that much of the difference lies in the
low
>expressing genes - but this does not mean that these genes are
"absent".
>
>--Naomi
>
>At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote:
> >Your question is bit vague and you provide little information. I do
not
> >think LIMMA has preprocessing capabilities for Affymetrix data.
> >
> >1) How did you preprocess the data ?
> >
> >2) How did you "analyse" your data in dChip ? What technique (e.g.
fold
> >change, t-test, wilcoxon) did you use in dChip ?
> >
> >3) How did you select the differentially expressed genes ? (e.g.
via
p-
> >value cutoff or biological significance).
> >
> >
> >One possibility is that you are using very different test
statistics.
> >With 5 in each group, it is difficult to draw any conclusions as
some
> >methods are more robust than others at small number of arrays.
> >
> >Another is that you choose a threshold that includes a lot of noisy
> >gene. An extreme example is to select all genes with a p-value less
>than
> >1 in which case you get 100% agreement between the two methods.
> >
> >And yet another, you may have made a coding/programming error
>somewhere.
> >
> >Regards, Adai
> >
> >
> >
> >On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote:
> > > Dear list member,
> > > I have a set of Affymetrix data of 10 arrays, HG_U133A,
seperated
>into
> > unpaired
> > > two groups of 5 arrays each. I processed the data using LIMMA
and
> > dChip. For
> > > dChip, I used all the default setting. The resulted differential
>expressed
> > > genes of the two have only less than 50% in common.
> > >
> > > Why the number of the overlapped genes of the two results is so
low?
>Is
> > there
> > > any problems? Can anyone help me?
> > >
> > > Thanks in advance,
> > > Jun
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor@stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
>*********************************************************************
*
>This email and any files transmitted with it are
confidentia...{{dropped}}
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
**********************************************************************
This email and any files transmitted with it are
confidentia...{{dropped}}