LARGE Change in GCRMA expression values between versions (REPEAT QUESTION: PLEASE REPLY)

0

Entering edit mode

Richard Friedman ★ 2.0k

@richard-friedman-513

Last seen 9.7 years ago

Dear Zhijin, (I sent the following in May and June to the list (worded slightly differently) but did not receive a reply. I would appreciate it if you, or someone on the list could could clear up this problem) I am noticing a large change in the absolute values of intensity measurements For the same probeset and and array normalized with the same 8 other arrays done with GCRMA 2.10 I got 5.27 but for GCRMA 2.12 I got 3.14 Does this sound like a change that can be expected between versions 2.10 and 2.12, or does it sound as if I had made an error of some kind? Is the change due to revising GCRMA to take the criticism of Lim et al. into account? Is so has version 2.12 been benchmarked against affycomp? Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ In Memoriam, Algirdas Jonas Budrys

Cancer gcrma Cancer gcrma • 1.1k views

ADD COMMENT • link updated 15.8 years ago by Zhijin Wu ▴ 260 • written 15.8 years ago by Richard Friedman ★ 2.0k

0

Entering edit mode

Zhijin Wu ▴ 260

@zhijin-wu-2378

Last seen 9.7 years ago

Yes, the 2.12 version has made the change so probes without affinity information do not go through GSB(gene specific binding) adjustment. So there may be a small number of probesets that are affected. best Zhijin Richard Friedman wrote: > Dear Zhijin, > > (I sent the following in May and June to the list (worded > slightly differently) but did not receive a reply. > I would appreciate it if you, or someone on the list could could > clear up this problem) > > I am noticing a large change in the absolute values of > intensity measurements > > For the same probeset and and array normalized with the same 8 other > arrays done with > GCRMA 2.10 I got 5.27 > but for > GCRMA 2.12 I got 3.14 > > Does this sound like a change that can be expected between versions > 2.10 and 2.12, or does it sound as if I had made an error of some kind? > > Is the change due to revising GCRMA to take the criticism of Lim et > al. into account? > Is so has version 2.12 been benchmarked against affycomp? > > > Thanks and best wishes, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Box 95, Room 130BB or P&S 1-420C > Columbia University Medical Center > 630 W. 168th St. > New York, NY 10032 > (212)305-6901 (5-6901) (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In Memoriam, > Algirdas Jonas Budrys > > >

ADD COMMENT • link 15.8 years ago Zhijin Wu ▴ 260

0

Entering edit mode

Zhijin, Thank you for your reply. One more question. Has GCRMA version 2.12 been benchmarked for accuracy? That is to say is it as accurate as version 2.10 for differential expression? Best wishes, Rich On Jul 8, 2008, at 12:22 PM, Zhijin Wu wrote: > Yes, the 2.12 version has made the change so probes without > affinity information do not go through GSB(gene specific binding) > adjustment. So there may be a small number of probesets that are > affected. > > best > Zhijin > Richard Friedman wrote: >> Dear Zhijin, >> >> (I sent the following in May and June to the list (worded >> slightly differently) but did not receive a reply. >> I would appreciate it if you, or someone on the list could could >> clear up this problem) >> >> I am noticing a large change in the absolute values of >> intensity measurements >> >> For the same probeset and and array normalized with the same 8 >> other >> arrays done with >> GCRMA 2.10 I got 5.27 >> but for >> GCRMA 2.12 I got 3.14 >> >> Does this sound like a change that can be expected between versions >> 2.10 and 2.12, or does it sound as if I had made an error of some >> kind? >> >> Is the change due to revising GCRMA to take the criticism of Lim >> et al. into account? >> Is so has version 2.12 been benchmarked against affycomp? >> >> >> Thanks and best wishes, >> Rich >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Box 95, Room 130BB or P&S 1-420C >> Columbia University Medical Center >> 630 W. 168th St. >> New York, NY 10032 >> (212)305-6901 (5-6901) (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> In Memoriam, >> Algirdas Jonas Budrys >> >> >> >

ADD REPLY • link 15.8 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Hi, Since we by default use GCRMA in our pipeline, I (also) had a look how the newest GCRMA library (2.12.1) performed on several of our datasets. I must admit the outcomes were rather startling to me.... Based on some testing we did few years ago we decided to use GCRMA _only_ using the option fast=FALSE. Thus: gcrma.data <- gcrma(data, fast=FALSE) The reason for this was mainy because when comparing the effects of a synthetic drug (agonist) in wild-type and receptor-null mice we noticed a very strange p-value distribution when we used the option 'fast=TRUE'. As expected, it peaked at the left (= because of the differential expessed genes), but instead of gradually fading out to the right we noticed a small bump in the middle of this distribution graph. This became extremely prominent when we compared the effects of the drug in the receptor null mice; we clearly observed a bump in the middle of this curve although we expected an almost flat curve.... (The receptor this drug binds to was knocked out in these null mice, so we _knew_ treatment with this drug should only have a very little effect, if all, on gene expression! [in contrast to the WT mice]). Remarkably, this unexpected bump completely disappeared when using the same CEL files and GCRMA with the option 'fast=FALSE'. See slides 3 and 4 here: http://members.chello.nl/g.hooiveld/pics/GCRMA_evaluation.pdf We therefore concluded that we better should not use the maximum likelihood estimate (MLE) for background, but rather the Emperical Bayes (EB) estimate for BG correction. Wei Keat Lim et al. then reported about a specific problem with GCRMA, and proposed a modification. Since as a non-statistician I could not fully understand what his modification did, I decided to ask him about some practical aspects (see below for full conversation). Based on this I learned that both ML and EB estimates for bakckground are affected, although at different levels. So, considering the above, with the release of GCRMA v2.12.1 I decided to analyse/compare the before mentioned arrays again. Results of this are at slides 1 and 2 [note however that the width of the bins is smaller than those in slides 3&4, but this does not affect the message]. Thus, I expected that the latest GCRMA would not show this strange bump in the p-value distribution plots anymore, regardles whether the ML (fast=TRUE) or EB (fast=FALSE) estimate for BG was used. However, this is in contrast with what I observed, for fast=TRUE I noticed this bump is still there, but (this is new) also that the p-values 'peaked' completely at the right of the graph (p=0.95-1), both in the WT and receptor null mice! Surprisingly, for fast=FALSE the p-value distr looked as expected.... What the exact cause (and solution) is for this strange behaviour I leave to the real knowledgeable people/experts, but I thought it is good to share my observations. Regards, Guido E-mail conversation with Wei Keat Lim in August 2007: Q (Guido): do you expect that the specific artifacts introduced using the ML estimate also occur when using the EB estimate (because our data suggest EB seems not to be affected)? A (Lim): Yes, the artifact introduced in GCRMA is independent of the method used (MLE or EB). You observed a difference in your results because the default threshold used to truncate uninformative probe is lower in EB. The threshold is a function of 'fast', i.e. k=6*fast+0.5*(1-fast). Hence, k=6 in MLE and k=0.5 in EB. The artifact still exists but less obvious in EB implementation. I then followed up on this: Q (Guido): > Thus, based on these analyses I conclude that when using the EB estimate for NSB calculation, your mod doesn't really alter/improve the expression level determination. However, in your first reply you mentioned that your mod should also affect the "slow" gcRMA. Based on my data, do you agree this is only marginally and therefore there is no real need to apply your mod? Or did I overlook something and am I completely wrong? Related to this, have you checked how the "slow" default gcRMA is positioned in Fig8 of your paper? A (WKL): > Thanks a lot for sharing the information. The part in gcrma that we suggest to modify does not relate to MLE or EB methods. The fact that you observed a difference between them is just because they are using a different cutoff value (k) by default. You will see a big difference if you choose k=6 (default in MLE) in EB implementation. > Another thing to note is that we did not expect to see big changes in differential expression analysis, and what we showed earlier is just the effects in gene-gene statistical dependencies. You can always use gcrma-EB if that does not affect downstream analysis in your dataset. We just proposed a correction that could eliminate artificial gene-gene correlation in the implementation, independent of MLE, EB and parameter k. > Thanks, > Wei Keat This was followed by: Q (Guido): I just read your paper again and you clearly show that your GCRMA modification improves the estimation of gene-gene correlations. Since you applied the MLE method of GCRMA this effect is [likely] more pronounced than when applying the EB implementation (for reasons you described; it's dependent on setting of k). Since we normally use the EB implementation, I used your code on one of our datasets, and found that this did not really affect the calculated gene expression levels and subsequent statistical analysis. Therefore I concluded that for us there is no need to use your modded GCRMA algorithm. However, this may be a wrong conclusion since you adjusted the GSB procedure, which is the same for the MLE or EB implementation. In other words, it still would be wise to use your modified GCRMA code, because I did not look at parameters that specifically 'tested' the impact of the mod (which you did in your paper). Thus form our point of view: your modification will at least gives outcomes identical to the default GCRMA (EB imlementation) ("worst" case), but it will likely (although slightly) improve our analyses as well. Guido A (WKL): Guido, Yes. You got the point :) It's possible that you observe changes in gene-gene correlation but not in differential expression. Imagine that there are 2 genes in 10 samples. Both of them have low expression in control samples and high expression in treatment samples. Artificial gene-gene correlation in these samples does not change the fact that level of expression is different between control and treatment. The effect is just to increase the correlation among control/treatment samples. Best, Wei Keat ------------------------------------------------ Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 internet: http://nutrigene.4t.com email: guido.hooiveld at wur.nl > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Zhijin Wu > Sent: 08 July 2008 18:22 > To: Richard Friedman > Cc: Rafael A. Irizarry; bio c bioconductor > Subject: Re: [BioC] LARGE Change in GCRMA expression values > between versions (REPEAT QUESTION: PLEASE REPLY) > > Yes, the 2.12 version has made the change so probes without > affinity information do not go through GSB(gene specific > binding) adjustment. So there may be a small number of > probesets that are affected. > > best > Zhijin > Richard Friedman wrote: > > Dear Zhijin, > > > > (I sent the following in May and June to the list (worded > > slightly differently) but did not receive a reply. > > I would appreciate it if you, or someone on the list could could > > clear up this problem) > > > > I am noticing a large change in the absolute values of > intensity > > measurements > > > > For the same probeset and and array normalized with the same 8 > > other arrays done with GCRMA 2.10 I got 5.27 but for GCRMA > 2.12 I got > > 3.14 > > > > Does this sound like a change that can be expected between versions > > 2.10 and 2.12, or does it sound as if I had made an error > of some kind? > > > > Is the change due to revising GCRMA to take the criticism > of Lim et > > al. into account? > > Is so has version 2.12 been benchmarked against affycomp? > > > > > > Thanks and best wishes, > > Rich > > ------------------------------------------------------------ > > Richard A. Friedman, PhD > > Associate Research Scientist, > > Biomedical Informatics Shared Resource Herbert Irving Comprehensive > > Cancer Center (HICCC) Lecturer, Department of Biomedical > Informatics > > (DBMI) Educational Coordinator, Center for Computational > Biology and > > Bioinformatics (C2B2)/ National Center for Multiscale Analysis of > > Genomic Networks (MAGNet) Box 95, Room 130BB or P&S 1-420C Columbia > > University Medical Center 630 W. 168th St. > > New York, NY 10032 > > (212)305-6901 (5-6901) (voice) > > friedman at cancercenter.columbia.edu > > http://cancercenter.columbia.edu/~friedman/ > > > > In Memoriam, > > Algirdas Jonas Budrys > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 15.8 years ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

Hi One thing I have noticed with the "new" GCRMA (2.12.x) is that for low-expressed probesets, the expression values are exactly the same across arrays. I'm presuming that this explains the peak in p-values near 1 that Guido is seeing. I also note that the "infinitesimal amount of uniformly distributed noise" mentioned in the Lim et al paper is not added in the gcrma package. I think the result of this is you are likely to see many probesets with zero difference between conditions, and hence presumably a p-value of 1. When the Lim et al paper was presented at ISMB, I asked whether the new version of GCRMA had been benchmarked using benchmarks such as affycomp. I was told that it hadn't been but that it would be. FYI, I am currently working on a benchmark for combinations of summarisation/differential expression methods, and will include the "old" and "new" versions of GCRMA. Best wishes Richard. Hooiveld, Guido wrote: > Hi, > > Since we by default use GCRMA in our pipeline, I (also) had a look how > the newest GCRMA library (2.12.1) performed on several of our datasets. > I must admit the outcomes were rather startling to me.... > > Based on some testing we did few years ago we decided to use GCRMA > _only_ using the option fast=FALSE. Thus: > gcrma.data <- gcrma(data, fast=FALSE) > > The reason for this was mainy because when comparing the effects of a > synthetic drug (agonist) in wild-type and receptor-null mice we noticed > a very strange p-value distribution when we used the option 'fast=TRUE'. > As expected, it peaked at the left (= because of the differential > expessed genes), but instead of gradually fading out to the right we > noticed a small bump in the middle of this distribution graph. This > became extremely prominent when we compared the effects of the drug in > the receptor null mice; we clearly observed a bump in the middle of this > curve although we expected an almost flat curve.... (The receptor this > drug binds to was knocked out in these null mice, so we _knew_ treatment > with this drug should only have a very little effect, if all, on gene > expression! [in contrast to the WT mice]). > > Remarkably, this unexpected bump completely disappeared when using the > same CEL files and GCRMA with the option 'fast=FALSE'. > See slides 3 and 4 here: > http://members.chello.nl/g.hooiveld/pics/GCRMA_evaluation.pdf > We therefore concluded that we better should not use the maximum > likelihood estimate (MLE) for background, but rather the Emperical Bayes > (EB) estimate for BG correction. > > > > Wei Keat Lim et al. then reported about a specific problem with GCRMA, > and proposed a modification. Since as a non-statistician I could not > fully understand what his modification did, I decided to ask him about > some practical aspects (see below for full conversation). Based on this > I learned that both ML and EB estimates for bakckground are affected, > although at different levels. > > So, considering the above, with the release of GCRMA v2.12.1 I decided > to analyse/compare the before mentioned arrays again. Results of this > are at slides 1 and 2 [note however that the width of the bins is > smaller than those in slides 3&4, but this does not affect the message]. > Thus, I expected that the latest GCRMA would not show this strange bump > in the p-value distribution plots anymore, regardles whether the ML > (fast=TRUE) or EB (fast=FALSE) estimate for BG was used. However, this > is in contrast with what I observed, for fast=TRUE I noticed this bump > is still there, but (this is new) also that the p-values 'peaked' > completely at the right of the graph (p=0.95-1), both in the WT and > receptor null mice! Surprisingly, for fast=FALSE the p-value distr > looked as expected.... > > What the exact cause (and solution) is for this strange behaviour I > leave to the real knowledgeable people/experts, but I thought it is good > to share my observations. > > Regards, > Guido > > > > > > E-mail conversation with Wei Keat Lim in August 2007: > Q (Guido): do you expect that the specific artifacts introduced using > the ML estimate also occur when using the EB estimate (because our data > suggest EB seems not to be affected)? > A (Lim): Yes, the artifact introduced in GCRMA is independent of the > method used (MLE or EB). You observed a difference in your results > because the default threshold used to truncate uninformative probe is > lower in EB. The threshold is a function of 'fast', i.e. > k=6*fast+0.5*(1-fast). Hence, k=6 in MLE and k=0.5 in EB. The artifact > still exists but less obvious in EB implementation. > > > I then followed up on this: > > Q (Guido): >> Thus, based on these analyses I conclude that when using the EB > estimate for NSB calculation, your mod doesn't really alter/improve the > expression level determination. However, in your first reply you > mentioned that your mod should also affect the "slow" gcRMA. Based on my > data, do you agree this is only marginally and therefore there is no > real need to apply your mod? Or did I overlook something and am I > completely wrong? Related to this, have you checked how the "slow" > default gcRMA is positioned in Fig8 of your paper? > A (WKL): >> Thanks a lot for sharing the information. The part in gcrma that we > suggest to modify does not relate to MLE or EB methods. The fact that > you observed a difference between them is just because they are using a > different cutoff value (k) by default. You will see a big difference if > you choose k=6 (default in MLE) in EB implementation. >> Another thing to note is that we did not expect to see big changes in > differential expression analysis, and what we showed earlier is just the > effects in gene-gene statistical dependencies. You can always use > gcrma-EB if that does not affect downstream analysis in your dataset. We > just proposed a correction that could eliminate artificial gene-gene > correlation in the implementation, independent of MLE, EB and parameter > k. >> Thanks, >> Wei Keat > > > This was followed by: > Q (Guido): > I just read your paper again and you clearly show that your GCRMA > modification improves the estimation of gene-gene correlations. Since > you applied the MLE method of GCRMA this effect is [likely] more > pronounced than when applying the EB implementation (for reasons you > described; it's dependent on setting of k). > Since we normally use the EB implementation, I used your code on one of > our datasets, and found that this did not really affect the calculated > gene expression levels and subsequent statistical analysis. Therefore I > concluded that for us there is no need to use your modded GCRMA > algorithm. However, this may be a wrong conclusion since you adjusted > the GSB procedure, which is the same for the MLE or EB implementation. > In other words, it still would be wise to use your modified GCRMA code, > because I did not look at parameters that specifically 'tested' the > impact of the mod (which you did in your paper). Thus form our point of > view: your modification will at least gives outcomes identical to the > default GCRMA (EB imlementation) ("worst" case), but it will likely > (although slightly) improve our analyses as well. > Guido > > A (WKL): > Guido, > Yes. You got the point :) > It's possible that you observe changes in gene-gene correlation but not > in differential expression. Imagine that there are 2 genes in 10 > samples. Both of them have low expression in control samples and high > expression in treatment samples. Artificial gene-gene correlation in > these samples does not change the fact that level of expression is > different between control and treatment. The effect is just to increase > the correlation among control/treatment samples. > Best, > Wei Keat > > > > ------------------------------------------------ > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > internet: http://nutrigene.4t.com > email: guido.hooiveld at wur.nl > > > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Zhijin Wu >> Sent: 08 July 2008 18:22 >> To: Richard Friedman >> Cc: Rafael A. Irizarry; bio c bioconductor >> Subject: Re: [BioC] LARGE Change in GCRMA expression values >> between versions (REPEAT QUESTION: PLEASE REPLY) >> >> Yes, the 2.12 version has made the change so probes without >> affinity information do not go through GSB(gene specific >> binding) adjustment. So there may be a small number of >> probesets that are affected. >> >> best >> Zhijin >> Richard Friedman wrote: >>> Dear Zhijin, >>> >>> (I sent the following in May and June to the list (worded >>> slightly differently) but did not receive a reply. >>> I would appreciate it if you, or someone on the list could could >>> clear up this problem) >>> >>> I am noticing a large change in the absolute values of >> intensity >>> measurements >>> >>> For the same probeset and and array normalized with the same 8 >>> other arrays done with GCRMA 2.10 I got 5.27 but for GCRMA >> 2.12 I got >>> 3.14 >>> >>> Does this sound like a change that can be expected between versions >>> 2.10 and 2.12, or does it sound as if I had made an error >> of some kind? >>> Is the change due to revising GCRMA to take the criticism >> of Lim et >>> al. into account? >>> Is so has version 2.12 been benchmarked against affycomp? >>> >>> >>> Thanks and best wishes, >>> Rich >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource Herbert Irving Comprehensive >>> Cancer Center (HICCC) Lecturer, Department of Biomedical >> Informatics >>> (DBMI) Educational Coordinator, Center for Computational >> Biology and >>> Bioinformatics (C2B2)/ National Center for Multiscale Analysis of >>> Genomic Networks (MAGNet) Box 95, Room 130BB or P&S 1-420C Columbia >>> University Medical Center 630 W. 168th St. >>> New York, NY 10032 >>> (212)305-6901 (5-6901) (voice) >>> friedman at cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> In Memoriam, >>> Algirdas Jonas Budrys >>> >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Richard D. Pearson richard.pearson at postgrad.manchester.ac.uk School of Computer Science, http://www.cs.man.ac.uk/~pearsonr University of Manchester, Tel: +44 161 275 6178 Oxford Road, Mob: +44 7971 221181 Manchester M13 9PL, UK. Fax: +44 161 275 6204

ADD REPLY • link 15.8 years ago Richard Pearson ▴ 390

Login before adding your answer.