Separate Normalizations and expression plotting

0

Entering edit mode

Lana Schaffer ★ 1.3k

@lana-schaffer-1056

Last seen 9.7 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061005/ 985bca96/attachment.pl

• 317 views

ADD COMMENT • link updated 17.6 years ago by Sean Davis 21k • written 17.6 years ago by Lana Schaffer ★ 1.3k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 4 months ago

United States

On Thursday 05 October 2006 21:14, Lana Schaffer wrote: > Hi, > This experiment involves the expression analysis between 2 batches of > AGE variable samples which were normalized separately because the > batches did not cluster together. The age groups of the second set > were in between the ages from the first set and all the data are desired > to be analyzed together. Now what happened is that the separately > normalized expression values became graphed together by the lab researchers > and plotted expression values vs age. Now with the diseased samples there > were genes which showed a "trend" with age where the R-squared were between > .16 and .4. From my training I get that this trend only is explains 16-40% > of the data and would not be significant. However, using Prism these > R-squares are called significantly different from zero. These researchers > explain to me that this is the way data is presented in their field and > that an R-squared of .16-.4 is considered excellent results. Indeed, with > non-diseased individuals the R-squared are zero for these genes. I > understand that in their field "any" trend is better than no trend, > especially since the samples are hetergeneous. However, this is not what > is taught in statistics. These graphs will be submitted to Journals under > my authorship and I am a bit shaken-up. > Would you please comment to me about your thoughs about the combination > of the 2 sets of expression values and the significance of the R-squared > values. Thanks, > Lana Lana, There are two issues it seems. The first is normalization of two separate batches, which is appropriate. What isn't clear is whether doing so introduces bias in downstream analyses--this you will need to judge for yourself. The second is of the significance of the results of the R-squared value. To convince yourself and your collaborators of the significance or lack thereof of the computed values, one can test whether the R-squared is significant or not. I would suggest using a permutation-based analysis, but the method is up to you. Judging an R-squared value by looking at the raw number is probably not a valid method for determining significance. Sean

ADD COMMENT • link 17.6 years ago Sean Davis 21k

0

Entering edit mode

It is important to remember that statistical significance refers to the investigator's ability to reproduce the result. A result can be statistically significant without having biological significance. The reported R-sq might be statistically significant but biologically insignificant. --Naomi At 06:57 AM 10/6/2006, Sean Davis wrote: >On Thursday 05 October 2006 21:14, Lana Schaffer wrote: > > Hi, > > This experiment involves the expression analysis between 2 batches of > > AGE variable samples which were normalized separately because the > > batches did not cluster together. The age groups of the second set > > were in between the ages from the first set and all the data are desired > > to be analyzed together. Now what happened is that the separately > > normalized expression values became graphed together by the lab researchers > > and plotted expression values vs age. Now with the diseased samples there > > were genes which showed a "trend" with age where the R-squared were between > > .16 and .4. From my training I get that this trend only is explains 16-40% > > of the data and would not be significant. However, using Prism these > > R-squares are called significantly different from zero. These researchers > > explain to me that this is the way data is presented in their field and > > that an R-squared of .16-.4 is considered excellent results. Indeed, with > > non-diseased individuals the R-squared are zero for these genes. I > > understand that in their field "any" trend is better than no trend, > > especially since the samples are hetergeneous. However, this is not what > > is taught in statistics. These graphs will be submitted to Journals under > > my authorship and I am a bit shaken-up. > > Would you please comment to me about your thoughs about the combination > > of the 2 sets of expression values and the significance of the R-squared > > values. Thanks, > > Lana > >Lana, > >There are two issues it seems. The first is normalization of two separate >batches, which is appropriate. What isn't clear is whether doing so >introduces bias in downstream analyses--this you will need to judge for >yourself. > >The second is of the significance of the results of the R-squared value. To >convince yourself and your collaborators of the significance or lack thereof >of the computed values, one can test whether the R-squared is significant or >not. I would suggest using a permutation-based analysis, but the method is >up to you. Judging an R-squared value by looking at the raw number is >probably not a valid method for determining significance. > >Sean > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 17.6 years ago Naomi Altman ★ 6.0k

Login before adding your answer.