Re: Logit-t vs RMA
3
0
Entering edit mode
Eric Blalock ▴ 250
@eric-blalock-78
Last seen 9.6 years ago
An HTML attachment was scrubbed... URL: https://www.stat.math.ethz.ch/pipermail/bioconductor/attachments/ 20030925/ccf17637/attachment.html
• 1.7k views
ADD COMMENT
0
Entering edit mode
@rafael-a-irizarry-205
Last seen 9.6 years ago
i havent had time to read this paper carefully. but here are some minor comments from what i saw: 1) if i understood correctly, they compare their test to the t-test (for RMA, dChip, and MAS). which in this data set implies they are doing 3 versus 3 comparisons (is this right?) With an N=3 the t-test has very little power. In fact, we find that with N=3, in these data (affy spikein etc...), average log fold change outperforms the t-test dramatically. the SAM statistic does even better: http://biosun01.biostat.jhsph.edu/~ririzarr/badttest.png notice in the posted figure that for high specificity (around 100 false positives), avg log fc give twice as many true positives. so their conclusion should not be that logit-t is better than RMA but rather that logit-t is a better test than a t-test when N=3 regardless of expression measure. not an impressive feat. RMA is not a test, its an expression measure. one can build tests with RMA. some will be better than others. judging by their ROC, RMA using the SAM stat or simply the avg log fc stat would outperform the logit-t. 2 - another problem i found is the use of the PPV for just one cut-off as an assessment. ROC curves where both true positive (TP) and false positives (FP) are shown are much more informative (notice TP and FP can be calculated easily from the rates if one knows the number of spiked in genes and total number of genes in the array). the PPV can be computed for any cutoff or point in the ROC curve: TP/(FP+TP). In affycomp (http://affycomp.biostat.jhsph.edu) we show ROC curves where the FP go only up to 100 since having lists with more than 100 FP is not practical. see Figure 5 here: http://biosun01.biostat.jhsph.edu/~ririzarr/papers/affycomp.pdf When computes a t-test and uses a p-value of 0.01 as the threshhold one is way outside this bound. So IMHO, Table 1 in their paper is misleading. because the ROC curve flattens very quickly, if one changed the p-value cut-off to 0.001 then the FPs for both RMA and dChip will reduce dramatically but the TPs wont reduce too much. this is why it is more informative to show ROC curves as opposed to just a number based on one point in the ROC curve. if a one number summary is needed, the area under the ROC curve or the ROC convex hull are much better summaries than just one PPV. the ROC curves shown in this paper go up to rates of 0.4 (5000 FP). for such tight comparisons, this should really go up to around 0.01 (100 FP) so one can see the area of interest. in our NAR paper we show rates up to 1.0 but this is because the comparisons where not tight at all. 3 - a minor mistake is that they incorrectly state that affy;s spikein is done in the hgu95av2 chip. it was done on the hgu95a chip. 4- finally On Thu, 25 Sep 2003, Eric wrote: <snip> > Lemon et al. developed an interesting and possibly improved gauge of > confidence called the positive predictive value (PPV) that may be useful the positive predictive value (PPV) is a term that has been around for decades. in medical language: "the positive predictive value of a test is the probability that the patient has the disease when restricted to those patients who test positive." a simple estimate is TP/(TP+FP). hope this helps, rafael > for future scientists looking to test their low level algorithms on known > data sets, but the heart of the paper has to do with their idea on > transforming the intensity values. > > The authors set out, using a variation on Langmuir's adsorption isotherm > (that is, the classic semi-log sigmoidal dose-response relationship) to > transform the intensity values of individual probes on the array. To me, > this makes more biological sense than some other procedures because it is > based on the ligand-receptor relationship between the probes and the mRNA > species to which they are designed to hybridize. > > However, when the authors combined their transformed feature level > information into a single measure per probe set, they found that their > procedure (Logit-Exp and Logit-ExpR) performed no better than RMA or > dChip. > > Interestingly, if they DID NOT collapse their probe level data into a > single probe_set value, and instead tested across all probes (logit-t), > their transformed data did a much better job of winnowing the wheat from > the chaff. They concluded that "...the modeling paradigm may cause the > loss of information from the probe-level data".. > > This seems critical to me, there is a huge discrepancy between the > significant gene lists generated with different probe level algorithms, > and I don't believe we'll be able to understand why that dichotomy exists > until we look at the underlying probe level information. > > I have been pleading with our stats department for over a year (I am just > a neuroscientist and I write code like a hippopotamus roller skates) to > employ a 2-way ANOVA on repeated measures at the probe level to test for > significance, and in fact went so far as to put the notion (with some > sample data) into a book chapter I authored earlier this year (Chapter 6: > in A Beginner's Guide to Microarrays). > > The authors state that "the combination of logit transformation and > probe-level statistical testing provides a means for greatly improved > PPV...". I would agree, but add the caveat that the comparison, at the > probe level, on untransformed values has yet to be done, thus the probe > level idea may be more important than the transformation notion to > improved PPV. Other methods have looked at the probe level information > (e.g., Liu et al. 2002- Affymetrix multiple pairwise comparison- but > their use of the feature level data as biological n may be inapropriate; > and Zhang et al., 2002- but their intention was only for a two chip > comparison). > > I believe that it is unfortunate that the authors resort to fold change > as a final discriminator after all of that hard work, rather than a > formal statistical test. I still feel that 2-way ANOVA on repeated > measures is the right test for this, but would love to hear from others. > > -E > > P.S. My apologies to Lemon et al if I have misrepresented/ misunderstood > your work. I will gladly retract/ correct this (or any part of it) at > your request. > > At 12:00 PM 9/25/2003 +0200, you wrote: > > Message: 1 > Date: Wed, 24 Sep 2003 19:46:04 +0200 > From: "Dario Greco" <greco@biogem.it> > Subject: [BioC] ...Logit-t vs RMA... > To: "Bioconductor" <bioconductor@stat.math.ethz.ch> > Message-ID: <002601c382c3$bd4a42a0$ce3ca48c@neo> > Content-Type: text/plain; charset="us-ascii" > > Hi to everybody, > I've just red some days ago the new paper on "logit-t" method > to analyze > affy chips. > > -------------------------------------------------------------- > "A high performance test of differential gene expression for > oligonucleotide arrays" > > William J Lemon, Sandya Liyanarachchi and Ming You > > Genome Biology 2003, 4:R67 > -------------------------------------------------------------- > > What do you think about this? > > Regards > Dario > > -------------------------------------------- > Dario Greco > Institute of Genetics and Biophysics > "Adriano Buzzati Traverso" - CNR > 111, Via P.Castellino > 80131 Naples, Italy > phone +39 081 6132 367 > fax +39 081 6132 350 > email: greco@igb.cnr.it; greco@biogem.it > > Eric Blalock, PhD > Dept Pharmacology, UKMC > 859 323-8033 > > STATEMENT OF CONFIDENTIALITY > > The contents of this e-mail message and any attachments are confidential > and are intended solely for addressee. The information may also be > legally privileged. This transmission is sent in trust, for the sole > purpose of delivery to the intended recipient. If you have received this > transmission in error, any use, reproduction or dissemination of this > transmission is strictly prohibited. If you are not the intended > recipient, please immediately notify the sender by reply e-mail or at > (859) 323-8033 and delete this message and its attachments, if any. >
ADD COMMENT
0
Entering edit mode
Eric Blalock ▴ 250
@eric-blalock-78
Last seen 9.6 years ago
I knew when I saw your reply in my mailbox that I was in trouble :/ Thanks for the clarification- I hadn't looked carefully at the PPV references, but I believe I may have done the authors a disservice by implying that they had developed PPV, rather than adopting it (this was my mistaken impression and not their claim). I'd leave it to the authors to address the short falls you point out, and I agree that the paper may inappropriately/ unfairly compare logit-t to RMA and dChip. My enthusiasm regarding the paper is that it is the first one (to my knowledge) that has interrogated probe_set differences at the probe level across groups (rather than pairwise), especially since the authors reported what I suspected- that the probe level data does a better job than the expression values generated by a probe level algorithm (I could still be totally wrong, but it IS what I have been suspecting). This may be more important for discerning meaningful differences than the transforms themselves. If it is possible to interrogate the data at this level, why bother with probe level algorithms which may lose information in the process of cooking 11-32 intensity values into a single number? Personally, I'd be willing to tolerate the extra statistical complexity to get results that more accurately reflect the biological processes under investigation. I realize that the data set you had to work with was relatively small (n = 3/ group), but would you still advocate average log fold change as a discriminator if you had say 10 chips in each group? While it would probably still work well for spike-in data, how would it do on real live biological samples? We are working very hard to dispel the notion that one can find accurate microarray results with an n of 3 per group, particularly in animal studies. This has never been acceptable in univariate work (at least in our field of research) and there is nothing magical about microarray technology that makes Gene Chips more capable of assessing biological variance when only a few biological replicates are present. In our grant writing, comments to other researchers in the neurosciences, and advice from our microarray core, we are strongly advocating sufficient replication and statistical determination of significant differences. Cheers, -E At 01:13 AM 9/26/2003 -0400, you wrote: >i havent had time to read this paper carefully. but here are some minor >comments from what i saw: > >1) if i understood correctly, they compare their test to the t-test >(for RMA, dChip, and MAS). which in this data set implies they are doing >3 versus 3 comparisons (is this right?) >With an N=3 the t-test has very little >power. In fact, we find that with N=3, in these data (affy spikein >etc...), average log fold change outperforms the t-test dramatically. the >SAM statistic does even better: > >http://biosun01.biostat.jhsph.edu/~ririzarr/badttest.png > >notice in the posted figure that for high specificity (around 100 false >positives), avg log fc give twice as many true positives. > >so their conclusion should not be that logit-t is better than RMA but >rather that logit-t is a better test than a t-test when N=3 regardless of >expression measure. not an impressive feat. RMA is not a test, its an >expression measure. one can build tests with RMA. some will be better than >others. judging by their ROC, RMA using the SAM stat or simply the avg log >fc stat would outperform the logit-t. > >2 - another problem i found is the use of the PPV for just one cut- off as >an assessment. ROC curves where both true positive (TP) and false >positives (FP) are shown are much more informative (notice TP and FP can >be calculated easily from the rates if one knows the number of spiked in >genes and total number of genes in the array). the PPV can be computed for >any cutoff or point in the ROC curve: TP/(FP+TP). In affycomp >(http://affycomp.biostat.jhsph.edu) we show >ROC curves where the FP go only up to 100 since having lists with more >than 100 FP is not practical. see Figure 5 here: > >http://biosun01.biostat.jhsph.edu/~ririzarr/papers/affycomp.pdf > >When computes a t-test and uses a p-value of 0.01 as the threshhold one >is way outside this bound. So IMHO, Table 1 in their paper is misleading. >because the ROC curve flattens very quickly, if >one changed the p-value cut-off to 0.001 then the FPs for both RMA and >dChip will reduce dramatically but the TPs wont reduce too much. this is >why it is more informative to show ROC curves as opposed to just a >number based on one point in the ROC curve. if a one number summary is >needed, the area under the ROC curve or the ROC convex hull are much >better summaries than just one PPV. > >the ROC curves shown in this paper go up to rates of 0.4 (5000 FP). for >such tight comparisons, this should really go up to around 0.01 (100 FP) >so one can see the area of interest. in our NAR paper we show rates up to >1.0 but this is because the comparisons where not tight at all. 3 - a minor mistake is that they incorrectly state that affy;s spikein is done in the hgu95av2 chip. it was done on the hgu95a chip. 4- finally On Thu, 25 Sep 2003, Eric wrote: <snip> > Lemon et al. developed an interesting and possibly improved gauge of > confidence called the positive predictive value (PPV) that may be useful the positive predictive value (PPV) is a term that has been around for decades. in medical language: "the positive predictive value of a test is the probability that the patient has the disease when restricted to those patients who test positive." a simple estimate is TP/(TP+FP). hope this helps, rafael > > for future scientists looking to test their low level algorithms on known > > data sets, but the heart of the paper has to do with their idea on > > transforming the intensity values. > > > > The authors set out, using a variation on Langmuir's adsorption isotherm > > (that is, the classic semi-log sigmoidal dose-response relationship) to > > transform the intensity values of individual probes on the array. To me, > > this makes more biological sense than some other procedures because it is > > based on the ligand-receptor relationship between the probes and the mRNA > > species to which they are designed to hybridize. > > > > However, when the authors combined their transformed feature level > > information into a single measure per probe set, they found that their > > procedure (Logit-Exp and Logit-ExpR) performed no better than RMA or > > dChip. > > > > Interestingly, if they DID NOT collapse their probe level data into a > > single probe_set value, and instead tested across all probes (logit-t), > > their transformed data did a much better job of winnowing the wheat from > > the chaff. They concluded that "...the modeling paradigm may cause the > > loss of information from the probe-level data".. > > > > This seems critical to me, there is a huge discrepancy between the > > significant gene lists generated with different probe level algorithms, > > and I don't believe we'll be able to understand why that dichotomy exists > > until we look at the underlying probe level information. > > > > I have been pleading with our stats department for over a year (I am just > > a neuroscientist and I write code like a hippopotamus roller skates) to > > employ a 2-way ANOVA on repeated measures at the probe level to test for > > significance, and in fact went so far as to put the notion (with some > > sample data) into a book chapter I authored earlier this year (Chapter 6: > > in A Beginner's Guide to Microarrays). > > > > The authors state that "the combination of logit transformation and > > probe-level statistical testing provides a means for greatly improved > > PPV...". I would agree, but add the caveat that the comparison, at the > > probe level, on untransformed values has yet to be done, thus the probe > > level idea may be more important than the transformation notion to > > improved PPV. Other methods have looked at the probe level information > > (e.g., Liu et al. 2002- Affymetrix multiple pairwise comparison- but > > their use of the feature level data as biological n may be inapropriate; > > and Zhang et al., 2002- but their intention was only for a two chip > > comparison). > > > > I believe that it is unfortunate that the authors resort to fold change > > as a final discriminator after all of that hard work, rather than a > > formal statistical test. I still feel that 2-way ANOVA on repeated > > measures is the right test for this, but would love to hear from others. > > > > -E > > > > P.S. My apologies to Lemon et al if I have misrepresented/ misunderstood > > your work. I will gladly retract/ correct this (or any part of it) at > > your request. > > > > At 12:00 PM 9/25/2003 +0200, you wrote: > > > > Message: 1 > > Date: Wed, 24 Sep 2003 19:46:04 +0200 > > From: "Dario Greco" <greco@biogem.it> > > Subject: [BioC] ...Logit-t vs RMA... > > To: "Bioconductor" <bioconductor@stat.math.ethz.ch> > > Message-ID: <002601c382c3$bd4a42a0$ce3ca48c@neo> > > Content-Type: text/plain; charset="us-ascii" > > > > Hi to everybody, > > I've just red some days ago the new paper on "logit-t" method > > to analyze > > affy chips. > > > > -------------------------------------------------------------- > > "A high performance test of differential gene expression for > > oligonucleotide arrays" > > > > William J Lemon, Sandya Liyanarachchi and Ming You > > > > Genome Biology 2003, 4:R67 > > -------------------------------------------------------------- > > > > What do you think about this? > > > > Regards > > Dario > > > > -------------------------------------------- > > Dario Greco > > Institute of Genetics and Biophysics > > "Adriano Buzzati Traverso" - CNR > > 111, Via P.Castellino > > 80131 Naples, Italy > > phone +39 081 6132 367 > > fax +39 081 6132 350 > > email: greco@igb.cnr.it; greco@biogem.it > > > > Eric Blalock, PhD > > Dept Pharmacology, UKMC > > 859 323-8033 > > > > STATEMENT OF CONFIDENTIALITY > > > > The contents of this e-mail message and any attachments are confidential > > and are intended solely for addressee. The information may also be > > legally privileged. This transmission is sent in trust, for the sole > > purpose of delivery to the intended recipient. If you have received this > > transmission in error, any use, reproduction or dissemination of this > > transmission is strictly prohibited. If you are not the intended > > recipient, please immediately notify the sender by reply e-mail or at > > (859) 323-8033 and delete this message and its attachments, if any. > > Eric Blalock, PhD Dept Pharmacology, UKMC 859 323-8033 STATEMENT OF CONFIDENTIALITY The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or at (859) 323-8033 and delete this message and its attachments, if any.
ADD COMMENT
0
Entering edit mode
see below. On Fri, 26 Sep 2003, Eric wrote: > I knew when I saw your reply in my mailbox that I was in trouble :/ > > Thanks for the clarification- I hadn't looked carefully at the PPV > references, but I believe I may have done the authors a disservice by > implying that they had developed PPV, rather than adopting it (this was my > mistaken impression and not their claim). I'd leave it to the authors to > address the short falls you point out, and I agree that the paper may > inappropriately/ unfairly compare logit-t to RMA and dChip. > > My enthusiasm regarding the paper is that it is the first one (to my > knowledge) that has interrogated probe_set differences at the probe level > across groups (rather than pairwise), especially since the authors reported > what I suspected- that the probe level data does a better job than the > expression values generated by a probe level algorithm (I could still be > totally wrong, but it IS what I have been suspecting). This may be more > important for discerning meaningful differences than the transforms > themselves. If it is possible to interrogate the data at this level, why > bother with probe level algorithms which may lose information in the > process of cooking 11-32 intensity values into a single number? Personally, > I'd be willing to tolerate the extra statistical complexity to get results > that more accurately reflect the biological processes under investigation. I agree that using probe level data to estimate differential expression skiping the calculation of an expression measure is a good idea. in fact, we have tried it using our model and rlm but it didnt work any better. we continue to look at this issue. my criticism was on the assessments. Just because this is a good idea doesnt mean their specific method will be any good. although i suspect that a good assessment will in fact show its a decent procedure. > > I realize that the data set you had to work with was relatively small (n = > 3/ group), but would you still advocate average log fold change as a > discriminator if you had say 10 chips in each group? While it would > probably still work well for spike-in data, how would it do on real live > biological samples? We are working very hard to dispel the notion that one > can find accurate microarray results with an n of 3 per group, particularly > in animal studies. This has never been acceptable in univariate work (at > least in our field of research) and there is nothing magical about > microarray technology that makes Gene Chips more capable of assessing > biological variance when only a few biological replicates are present. In > our grant writing, comments to other researchers in the neurosciences, and > advice from our microarray core, we are strongly advocating sufficient > replication and statistical determination of significant differences. i agree with you 100%. biological variance complicates everything. but we have no assessment data with biological reps. Hopefully we will have some soon. As soon as we have something like this we will be able to assess what tests work best in what situations. with N=3 i doubt the t-test will win even with biological variance present. > > Cheers, > -E > > At 01:13 AM 9/26/2003 -0400, you wrote: > >i havent had time to read this paper carefully. but here are some minor > >comments from what i saw: > > > >1) if i understood correctly, they compare their test to the t-test > >(for RMA, dChip, and MAS). which in this data set implies they are doing > >3 versus 3 comparisons (is this right?) > >With an N=3 the t-test has very little > >power. In fact, we find that with N=3, in these data (affy spikein > >etc...), average log fold change outperforms the t-test dramatically. the > >SAM statistic does even better: > > > >http://biosun01.biostat.jhsph.edu/~ririzarr/badttest.png > > > >notice in the posted figure that for high specificity (around 100 false > >positives), avg log fc give twice as many true positives. > > > >so their conclusion should not be that logit-t is better than RMA but > >rather that logit-t is a better test than a t-test when N=3 regardless of > >expression measure. not an impressive feat. RMA is not a test, its an > >expression measure. one can build tests with RMA. some will be better than > >others. judging by their ROC, RMA using the SAM stat or simply the avg log > >fc stat would outperform the logit-t. > > > >2 - another problem i found is the use of the PPV for just one cut- off as > >an assessment. ROC curves where both true positive (TP) and false > >positives (FP) are shown are much more informative (notice TP and FP can > >be calculated easily from the rates if one knows the number of spiked in > >genes and total number of genes in the array). the PPV can be computed for > >any cutoff or point in the ROC curve: TP/(FP+TP). In affycomp > >(http://affycomp.biostat.jhsph.edu) we show > >ROC curves where the FP go only up to 100 since having lists with more > >than 100 FP is not practical. see Figure 5 here: > > > >http://biosun01.biostat.jhsph.edu/~ririzarr/papers/affycomp.pdf > > > >When computes a t-test and uses a p-value of 0.01 as the threshhold one > >is way outside this bound. So IMHO, Table 1 in their paper is misleading. > >because the ROC curve flattens very quickly, if > >one changed the p-value cut-off to 0.001 then the FPs for both RMA and > >dChip will reduce dramatically but the TPs wont reduce too much. this is > >why it is more informative to show ROC curves as opposed to just a > >number based on one point in the ROC curve. if a one number summary is > >needed, the area under the ROC curve or the ROC convex hull are much > >better summaries than just one PPV. > > > >the ROC curves shown in this paper go up to rates of 0.4 (5000 FP). for > >such tight comparisons, this should really go up to around 0.01 (100 FP) > >so one can see the area of interest. in our NAR paper we show rates up to > >1.0 but this is because the comparisons where not tight at all. > > 3 - a minor mistake is that they incorrectly state that affy;s spikein is > done in the hgu95av2 chip. it was done on the hgu95a chip. > > 4- finally > > On Thu, 25 Sep 2003, Eric wrote: > > <snip> > > Lemon et al. developed an interesting and possibly improved gauge of > > confidence called the positive predictive value (PPV) that may be useful > > the positive predictive value (PPV) is a term that has been around for > decades. in medical language: > "the positive predictive value of a test is the probability that the > patient has the disease when restricted to those patients who test > positive." a simple estimate is TP/(TP+FP). > > hope this helps, > rafael > > > > > > > for future scientists looking to test their low level algorithms on known > > > data sets, but the heart of the paper has to do with their idea on > > > transforming the intensity values. > > > > > > The authors set out, using a variation on Langmuir's adsorption isotherm > > > (that is, the classic semi-log sigmoidal dose-response relationship) to > > > transform the intensity values of individual probes on the array. To me, > > > this makes more biological sense than some other procedures because it is > > > based on the ligand-receptor relationship between the probes and the mRNA > > > species to which they are designed to hybridize. > > > > > > However, when the authors combined their transformed feature level > > > information into a single measure per probe set, they found that their > > > procedure (Logit-Exp and Logit-ExpR) performed no better than RMA or > > > dChip. > > > > > > Interestingly, if they DID NOT collapse their probe level data into a > > > single probe_set value, and instead tested across all probes (logit-t), > > > their transformed data did a much better job of winnowing the wheat from > > > the chaff. They concluded that "...the modeling paradigm may cause the > > > loss of information from the probe-level data".. > > > > > > This seems critical to me, there is a huge discrepancy between the > > > significant gene lists generated with different probe level algorithms, > > > and I don't believe we'll be able to understand why that dichotomy exists > > > until we look at the underlying probe level information. > > > > > > I have been pleading with our stats department for over a year (I am just > > > a neuroscientist and I write code like a hippopotamus roller skates) to > > > employ a 2-way ANOVA on repeated measures at the probe level to test for > > > significance, and in fact went so far as to put the notion (with some > > > sample data) into a book chapter I authored earlier this year (Chapter 6: > > > in A Beginner's Guide to Microarrays). > > > > > > The authors state that "the combination of logit transformation and > > > probe-level statistical testing provides a means for greatly improved > > > PPV...". I would agree, but add the caveat that the comparison, at the > > > probe level, on untransformed values has yet to be done, thus the probe > > > level idea may be more important than the transformation notion to > > > improved PPV. Other methods have looked at the probe level information > > > (e.g., Liu et al. 2002- Affymetrix multiple pairwise comparison- but > > > their use of the feature level data as biological n may be inapropriate; > > > and Zhang et al., 2002- but their intention was only for a two chip > > > comparison). > > > > > > I believe that it is unfortunate that the authors resort to fold change > > > as a final discriminator after all of that hard work, rather than a > > > formal statistical test. I still feel that 2-way ANOVA on repeated > > > measures is the right test for this, but would love to hear from others. > > > > > > -E > > > > > > P.S. My apologies to Lemon et al if I have misrepresented/ misunderstood > > > your work. I will gladly retract/ correct this (or any part of it) at > > > your request. > > > > > > At 12:00 PM 9/25/2003 +0200, you wrote: > > > > > > Message: 1 > > > Date: Wed, 24 Sep 2003 19:46:04 +0200 > > > From: "Dario Greco" <greco@biogem.it> > > > Subject: [BioC] ...Logit-t vs RMA... > > > To: "Bioconductor" <bioconductor@stat.math.ethz.ch> > > > Message-ID: <002601c382c3$bd4a42a0$ce3ca48c@neo> > > > Content-Type: text/plain; charset="us-ascii" > > > > > > Hi to everybody, > > > I've just red some days ago the new paper on "logit-t" method > > > to analyze > > > affy chips. > > > > > > -------------------------------------------------------------- > > > "A high performance test of differential gene expression for > > > oligonucleotide arrays" > > > > > > William J Lemon, Sandya Liyanarachchi and Ming You > > > > > > Genome Biology 2003, 4:R67 > > > -------------------------------------------------------------- > > > > > > What do you think about this? > > > > > > Regards > > > Dario > > > > > > -------------------------------------------- > > > Dario Greco > > > Institute of Genetics and Biophysics > > > "Adriano Buzzati Traverso" - CNR > > > 111, Via P.Castellino > > > 80131 Naples, Italy > > > phone +39 081 6132 367 > > > fax +39 081 6132 350 > > > email: greco@igb.cnr.it; greco@biogem.it > > > > > > Eric Blalock, PhD > > > Dept Pharmacology, UKMC > > > 859 323-8033 > > > > > > STATEMENT OF CONFIDENTIALITY > > > > > > The contents of this e-mail message and any attachments are confidential > > > and are intended solely for addressee. The information may also be > > > legally privileged. This transmission is sent in trust, for the sole > > > purpose of delivery to the intended recipient. If you have received this > > > transmission in error, any use, reproduction or dissemination of this > > > transmission is strictly prohibited. If you are not the intended > > > recipient, please immediately notify the sender by reply e-mail or at > > > (859) 323-8033 and delete this message and its attachments, if any. > > > > > Eric Blalock, PhD > Dept Pharmacology, UKMC > 859 323-8033 > > STATEMENT OF CONFIDENTIALITY > > The contents of this e-mail message and any attachments are confidential > and are intended solely for addressee. The information may also be legally > privileged. This transmission is sent in trust, for the sole purpose of > delivery to the intended recipient. If you have received this transmission > in error, any use, reproduction or dissemination of this transmission is > strictly prohibited. If you are not the intended recipient, please > immediately notify the sender by reply e-mail or at (859) 323-8033 and > delete this message and its attachments, if any. > >
ADD REPLY
0
Entering edit mode
>> My enthusiasm regarding the paper is that it is the first one (to my >> knowledge) that has interrogated probe_set differences at the probe >> level across groups (rather than pairwise), I haven't had a chance to read this paper yet, but I am looking forward to it... However, have you seen: Chu, Weir, & Wolfinger. A systematic statistical linear modeling approach to oligonucleotide array experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002 They advocate using the probe-level data in a linear mixed model. Assuming that each probe is an independent measure (which I know is not true because many of them overlap, but I'm ignoring this for now), using probe-level data gives 14-20 "replicates" per chip. We've based our analysis methods on this, and with two biological replicates per genetic line, and three genetic lines per phenotypic group, we've been able to detect as little as a 15% difference in gene expression at p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001). Should have the manuscript submitted in a couple weeks, and I'm working on using our method on the benchmark data to see how it compares. Results to be discussed... Cheers, Jenny especially since the >> authors reported what I suspected- that the probe level data does a >> better job than the expression values generated by a probe level >> algorithm (I could still be totally wrong, but it IS what I have been >> suspecting). This may be more important for discerning meaningful >> differences than the transforms themselves. If it is possible to >> interrogate the data at this level, why bother with probe level >> algorithms which may lose information in the process of cooking 11-32 >> intensity values into a single number? -- Jenny Drnevich, Ph.D. Department of Animal Biology University of Illinois, Urbana-Champaign 515 Morrill Hall 505 S. Goodwin Ave. Urbana, IL 61801 USA ph: 217-244-6826 fax: 217-244-4565 drnevich@uiuc.edu
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia
At 02:54 AM 27/09/2003, Jenny Drnevich wrote: > >> My enthusiasm regarding the paper is that it is the first one (to my > >> knowledge) that has interrogated probe_set differences at the probe > >> level across groups (rather than pairwise), This is what Ben Bolstad's AffyExtensions package does. >I haven't had a chance to read this paper yet, but I am looking forward to >it... >However, have you seen: Chu, Weir, & Wolfinger. A systematic statistical >linear modeling approach to oligonucleotide array experiments MATH BIOSCI >176 (1): 35-51 Sp. Iss. SI MAR 2002 >They advocate using the probe-level data in a linear mixed model. >Assuming that each probe is an independent measure (which I know is not >true because many of them overlap, but I'm ignoring this for now), using >probe-level data gives 14-20 "replicates" per chip. We've based our >analysis methods on this, and with two biological replicates per genetic >line, and three genetic lines per phenotypic group, we've been able to >detect as little as a 15% difference in gene expression at p=0.0001 (we >only expect 2 FP and get 60 genes with p=0.0001). Mmmm. Getting very low p-values from just two biological replicates doesn't lead you to question the validity of the p-values?? :) Gordon > Should have the >manuscript submitted in a couple weeks, and I'm working on using our >method on the benchmark data to see how it compares. Results to be >discussed... > >Cheers, >Jenny > > >especially since the > >> authors reported what I suspected- that the probe level data does a > >> better job than the expression values generated by a probe level > >> algorithm (I could still be totally wrong, but it IS what I have been > >> suspecting). This may be more important for discerning meaningful > >> differences than the transforms themselves. If it is possible to > >> interrogate the data at this level, why bother with probe level > >> algorithms which may lose information in the process of cooking 11-32 > >> intensity values into a single number? > >-- >Jenny Drnevich, Ph.D. >Department of Animal Biology >University of Illinois, Urbana-Champaign >515 Morrill Hall >505 S. Goodwin Ave. >Urbana, IL 61801 >USA >ph: 217-244-6826 >fax: 217-244-4565 >drnevich@uiuc.edu
ADD COMMENT
0
Entering edit mode
See below... >>However, have you seen: Chu, Weir, & Wolfinger. A systematic >> statistical linear modeling approach to oligonucleotide array >> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002 >>They advocate using the probe-level data in a linear mixed model. >> Assuming that each probe is an independent measure (which I know is not >> true because many of them overlap, but I'm ignoring this for now), >> using probe-level data gives 14-20 "replicates" per chip. We've based >> our analysis methods on this, and with two biological replicates per >> genetic line, and three genetic lines per phenotypic group, we've been >> able to detect as little as a 15% difference in gene expression at >> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001). > > Mmmm. Getting very low p-values from just two biological replicates > doesn't lead you to question the validity of the p-values?? :) But we don't just have two biological replicates. We're interested in consistent gene expression differences between phenotype 1 and phenotype 2. We looked at three different genetic lines showing phenotype 1 and three other lines that had phenotype 2. We made two biological replicates of each line, and the expression level of each gene was estimated by 14 probes. By running a mixed-model ANOVA separately for each gene with phenotype, line (nested within phenotype), probe, and all second-order interactions, the phenotype comparison has around 120 df (or so, off the top of my head). That's how we can detect a 15% difference in gene expression. As long as the statistical model is set up correctly, I never "question" the validity of p-values, although I might question the biological significance... :)
ADD REPLY
0
Entering edit mode
remeber p-value means "chance of seeing something as extreme as we saw given the null". If the null isnt true then the pvalue no longer means what we think it means. beware that many ANOVA models make assumptions about normality that are hard to defend when studying microarray data. with so few arrays we cant rely on the central limit theorem so we are stuck hoping the assumptions of normality hold, and they become part of the null hypothesis. i think sometimes, we are over optimistice thinking the "statistical model is setup correct" ... and then you have the multiple comparison problem! -r On Sun, 28 Sep 2003, Jenny Drnevich wrote: > See below... > > >>However, have you seen: Chu, Weir, & Wolfinger. A systematic > >> statistical linear modeling approach to oligonucleotide array > >> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002 > >>They advocate using the probe-level data in a linear mixed model. > >> Assuming that each probe is an independent measure (which I know is not > >> true because many of them overlap, but I'm ignoring this for now), > >> using probe-level data gives 14-20 "replicates" per chip. We've based > >> our analysis methods on this, and with two biological replicates per > >> genetic line, and three genetic lines per phenotypic group, we've been > >> able to detect as little as a 15% difference in gene expression at > >> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001). > > > > Mmmm. Getting very low p-values from just two biological replicates > > doesn't lead you to question the validity of the p-values?? :) > > But we don't just have two biological replicates. We're interested in > consistent gene expression differences between phenotype 1 and phenotype > 2. We looked at three different genetic lines showing phenotype 1 and > three other lines that had phenotype 2. We made two biological replicates > of each line, and the expression level of each gene was estimated by 14 > probes. By running a mixed-model ANOVA separately for each gene with > phenotype, line (nested within phenotype), probe, and all second- order > interactions, the phenotype comparison has around 120 df (or so, off the > top of my head). That's how we can detect a 15% difference in gene > expression. As long as the statistical model is set up correctly, I never > "question" the validity of p-values, although I might question the > biological significance... :) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
I recommend to all to read Berger and Sellke, 1987, J. Amer. Statist. Assoc., 82:112-122 on the p-value story and on calculations indicating how surprised we ought to be given the p-value. Michael N. On Sun, 28 Sep 2003, Rafael A. Irizarry wrote: > remeber p-value means "chance of seeing something as extreme as we saw > given the null". > If the null isnt true then the pvalue no longer means what > we think it means. beware that many ANOVA models make assumptions about > normality that are hard to defend when studying microarray data. with > so few arrays we cant rely on the central limit theorem so we are stuck > hoping the assumptions of normality hold, and they become part of the > null hypothesis. i think sometimes, we are over optimistice thinking > the "statistical model is setup correct" > > ... and then you have the multiple comparison problem! > > -r > > On Sun, 28 Sep 2003, Jenny Drnevich wrote: > > > See below... > > > > >>However, have you seen: Chu, Weir, & Wolfinger. A systematic > > >> statistical linear modeling approach to oligonucleotide array > > >> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002 > > >>They advocate using the probe-level data in a linear mixed model. > > >> Assuming that each probe is an independent measure (which I know is not > > >> true because many of them overlap, but I'm ignoring this for now), > > >> using probe-level data gives 14-20 "replicates" per chip. We've based > > >> our analysis methods on this, and with two biological replicates per > > >> genetic line, and three genetic lines per phenotypic group, we've been > > >> able to detect as little as a 15% difference in gene expression at > > >> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001). > > > > > > Mmmm. Getting very low p-values from just two biological replicates > > > doesn't lead you to question the validity of the p-values?? :) > > > > But we don't just have two biological replicates. We're interested in > > consistent gene expression differences between phenotype 1 and phenotype > > 2. We looked at three different genetic lines showing phenotype 1 and > > three other lines that had phenotype 2. We made two biological replicates > > of each line, and the expression level of each gene was estimated by 14 > > probes. By running a mixed-model ANOVA separately for each gene with > > phenotype, line (nested within phenotype), probe, and all second- order > > interactions, the phenotype comparison has around 120 df (or so, off the > > top of my head). That's how we can detect a 15% difference in gene > > expression. As long as the statistical model is set up correctly, I never > > "question" the validity of p-values, although I might question the > > biological significance... :) > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
Hi Jenny, your setup has several valid permutations which can be used to account for your setup and multiple testing. You can also try and estimate the proportion of genes different from the null. The FDR q value might be of more interest in this case. See "Statistical significance for genomewide studies" John D. Storey and Robert Tibshirani Justin http://naturalvariation.org >-----Original Message----- >From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- >bounces@stat.math.ethz.ch] On Behalf Of Jenny Drnevich >Sent: Sunday, September 28, 2003 1:28 PM >To: smyth@wehi.edu.au >Cc: bioconductor@stat.math.ethz.ch >Subject: [BioC] "validity" of p-values > >See below... > >>>However, have you seen: Chu, Weir, & Wolfinger. A systematic >>> statistical linear modeling approach to oligonucleotide array >>> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002 >>>They advocate using the probe-level data in a linear mixed model. >>> Assuming that each probe is an independent measure (which I know is not >>> true because many of them overlap, but I'm ignoring this for now), >>> using probe-level data gives 14-20 "replicates" per chip. We've based >>> our analysis methods on this, and with two biological replicates per >>> genetic line, and three genetic lines per phenotypic group, we've been >>> able to detect as little as a 15% difference in gene expression at >>> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001). >> >> Mmmm. Getting very low p-values from just two biological replicates >> doesn't lead you to question the validity of the p-values?? :) > >But we don't just have two biological replicates. We're interested in >consistent gene expression differences between phenotype 1 and phenotype >2. We looked at three different genetic lines showing phenotype 1 and >three other lines that had phenotype 2. We made two biological replicates >of each line, and the expression level of each gene was estimated by 14 >probes. By running a mixed-model ANOVA separately for each gene with >phenotype, line (nested within phenotype), probe, and all second- order >interactions, the phenotype comparison has around 120 df (or so, off the >top of my head). That's how we can detect a 15% difference in gene >expression. As long as the statistical model is set up correctly, I never >"question" the validity of p-values, although I might question the >biological significance... :) > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD REPLY
0
Entering edit mode
Dear Walter, I can make a quick meeting Tuesday at 3.3o in Sequoia 102, my Statistics office, Best Susan Holmes
ADD REPLY

Login before adding your answer.

Traffic: 597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6