On Thursday 05 October 2006 21:14, Lana Schaffer wrote:
> Hi,
> This experiment involves the expression analysis between 2 batches
of
> AGE variable samples which were normalized separately because the
> batches did not cluster together. The age groups of the second set
> were in between the ages from the first set and all the data are
desired
> to be analyzed together. Now what happened is that the separately
> normalized expression values became graphed together by the lab
researchers
> and plotted expression values vs age. Now with the diseased samples
there
> were genes which showed a "trend" with age where the R-squared were
between
> .16 and .4. From my training I get that this trend only is explains
16-40%
> of the data and would not be significant. However, using Prism
these
> R-squares are called significantly different from zero. These
researchers
> explain to me that this is the way data is presented in their field
and
> that an R-squared of .16-.4 is considered excellent results.
Indeed, with
> non-diseased individuals the R-squared are zero for these genes. I
> understand that in their field "any" trend is better than no trend,
> especially since the samples are hetergeneous. However, this is not
what
> is taught in statistics. These graphs will be submitted to Journals
under
> my authorship and I am a bit shaken-up.
> Would you please comment to me about your thoughs about the
combination
> of the 2 sets of expression values and the significance of the
R-squared
> values. Thanks,
> Lana
Lana,
There are two issues it seems. The first is normalization of two
separate
batches, which is appropriate. What isn't clear is whether doing so
introduces bias in downstream analyses--this you will need to judge
for
yourself.
The second is of the significance of the results of the R-squared
value. To
convince yourself and your collaborators of the significance or lack
thereof
of the computed values, one can test whether the R-squared is
significant or
not. I would suggest using a permutation-based analysis, but the
method is
up to you. Judging an R-squared value by looking at the raw number is
probably not a valid method for determining significance.
Sean
It is important to remember that statistical significance refers to
the investigator's ability to reproduce the result. A result can be
statistically significant without having biological
significance. The reported R-sq might be statistically significant
but biologically insignificant.
--Naomi
At 06:57 AM 10/6/2006, Sean Davis wrote:
>On Thursday 05 October 2006 21:14, Lana Schaffer wrote:
> > Hi,
> > This experiment involves the expression analysis between 2 batches
of
> > AGE variable samples which were normalized separately because the
> > batches did not cluster together. The age groups of the second
set
> > were in between the ages from the first set and all the data are
desired
> > to be analyzed together. Now what happened is that the separately
> > normalized expression values became graphed together by the lab
researchers
> > and plotted expression values vs age. Now with the diseased
samples there
> > were genes which showed a "trend" with age where the R-squared
were between
> > .16 and .4. From my training I get that this trend only is
explains 16-40%
> > of the data and would not be significant. However, using Prism
these
> > R-squares are called significantly different from zero. These
researchers
> > explain to me that this is the way data is presented in their
field and
> > that an R-squared of .16-.4 is considered excellent results.
Indeed, with
> > non-diseased individuals the R-squared are zero for these genes. I
> > understand that in their field "any" trend is better than no
trend,
> > especially since the samples are hetergeneous. However, this is
not what
> > is taught in statistics. These graphs will be submitted to
Journals under
> > my authorship and I am a bit shaken-up.
> > Would you please comment to me about your thoughs about the
combination
> > of the 2 sets of expression values and the significance of the
R-squared
> > values. Thanks,
> > Lana
>
>Lana,
>
>There are two issues it seems. The first is normalization of two
separate
>batches, which is appropriate. What isn't clear is whether doing so
>introduces bias in downstream analyses--this you will need to judge
for
>yourself.
>
>The second is of the significance of the results of the R-squared
value. To
>convince yourself and your collaborators of the significance or lack
thereof
>of the computed values, one can test whether the R-squared is
significant or
>not. I would suggest using a permutation-based analysis, but the
method is
>up to you. Judging an R-squared value by looking at the raw number
is
>probably not a valid method for determining significance.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111